Getting Data In

EXTRACT from specific field (using 'in' syntax) doesn't work without forcing an extract reload=T

Adam_Sealey
Explorer

I've been trying to do a search time field extraction, using the EXTRACT- stanza in props.conf.

From the props.conf docs (http://docs.splunk.com/Documentation/Splunk/5.0.2/Admin/Propsconf), it appears that there are 2 ways to perform a search time extraction using EXTRACT; either on the _raw field, or on a specific field.

When I try to perform the field extraction on a specific field (using the 'in' syntax), the extraction doesn't run unless I specify '| extract reload=T'

EXTRACT-extractDomain = (?<domain>(?:(?:(?:[^\.]+\.)?(?<tld>(?:[^\.\s]{2})(?:(?:\.[^\.\s][^\.\s])|(?:[^\.\s]+)))))).$ in questionname

When I remove the 'in questionname' portion of the extraction (resulting in the extraction being run on _raw), the extraction runs all the time (doesn't require '| extract reload=T')

EXTRACT-extractDomain = (?<domain>(?:(?:(?:[^\.]+\.)?(?<tld>(?:[^\.\s]{2})(?:(?:\.[^\.\s][^\.\s])|(?:[^\.\s]+)))))).$

Has anyone else run into this problem? In this case, I can rewrite my extraction to work on _raw, but there are other cases that I'm also working with that it would be very convenient to have the regex be applied to only one field.

Tags (2)
0 Karma
1 Solution

Ayn
Legend

The problem is most likely that your first extraction runs before the questionname field has been extracted, so there's nothing to extract from. When you run "| extract reload=T" separately that happens after all automatic extractions have already been applied so the questionname field exists in that case.

Extractions are done in alphabetical order, it might be per-sourcetype or globally, I forget which. Anyway EXTRACT-a will run before EXTRACT-b so if you have, for instance, EXTRACT-extractDomain and EXTRACT-questionname that will lead to the problems you're seeing.

View solution in original post

Adam_Sealey
Explorer

Exactly correct!

Using btool, I was able to see the order that the extractions are applied, and confirmed what you said.

EXTRACT-extractDomain = (?<domain>(?:(?:(?:[^\.]+\.)?(?<tld>(?:[^\.\s]{2})(?:(?:\.[^\.\s][^\.\s])|(?:[^\.\s]+)))))).$ in questionname
EXTRACT-opcode = (?<operation>[ R]) (?<opcode>.) \[(?<hexflags>[0-9A-Fa-f]+) (?<flags>....) (?<response>[^\]]+)\]
EXTRACT-protocol = (?<packetid>[0-9A-Fa-f]*) (?<protocol>UDP|TCP) (?<direction>\w+) (?<src_ip>[0-9A-Fa-f\.\:]+)\s+
EXTRACT-question1 = \] (?<questiontype>\w+)\s+(?<questionname>.*)
EXTRACT-question2 = \] (?<questionname>[^\s]*)$
EXTRACT-threadid = (?<threadid>[0-9A-Fa-f]+)\s+(?<context>PACKET)

When I renamed to zzExtractDomain, it works great because the questionname has been filled at that point

Thanks!

0 Karma

Ayn
Legend

The problem is most likely that your first extraction runs before the questionname field has been extracted, so there's nothing to extract from. When you run "| extract reload=T" separately that happens after all automatic extractions have already been applied so the questionname field exists in that case.

Extractions are done in alphabetical order, it might be per-sourcetype or globally, I forget which. Anyway EXTRACT-a will run before EXTRACT-b so if you have, for instance, EXTRACT-extractDomain and EXTRACT-questionname that will lead to the problems you're seeing.

View solution in original post

Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!