Splunk Search

Question on regex field extraction in props.conf - in which search time variable they are stored?

edoardo_vicendo
Communicator

Hi All,

I have some question on the regular expression extraction they can be added in props.conf
Supposing I have indexed in Splunk files with multiple lines that at a certain (not fixed) point have the following pattern, and I have to extract the "nameoftheuser" and the "nameofthejob"

= USER      : nameoftheuser AAA = JOB      : nameofthejob

I know I can do that in this way:

EXTRACT-USER = ^(?m)= USER      : (?P<USER>\w+)
EXTRACT-JOB  = ^(?m)= USER      : \w+ AAA = JOB      : (?P<JOB>\w+)

or even in that way:

EXTRACT-USER,JOB = ^(?m)= USER      : (?P<USER>\w+) AAA = JOB      : (?P<JOB>\w+)

My first question is, referring to props.conf documentation:

considering that EXTRACT-USER is the <class> and (?P<USER>\w+) is the <regex> the field will be stored in the class or in the regex? Just to be more clear:

EXTRACT-TEST1,TEST2 = ^(?m)= USER      : (?P<USER>\w+) AAA = JOB      : (?P<JOB>\w+)

at search time USER and JOB values will be stored in TEST1 and TEST2 variables or in USER and JOB variables?

Second question, I do not understand what exactly is indicated here in props.conf documentation:

Use '<regex> in <src_field>' to match the regex against the values of a
specific field. Otherwise it just matches against _raw (all raw event
data).

I understand it is an advice to improve the performance of the field extraction, but I do not get exactly how to take advantage of it...Does someone can explain it to me?

Thanks a lot,
Edoardo

0 Karma
1 Solution

FrankVl
Ultra Champion

The data goes into the field as you label it in the capture group. So EXTRACT-TEST1,TEST2 = ^(?m)= USER : (?P<USER>\w+) AAA = JOB : (?P<JOB>\w+) puts the data in the USER and JOB fields.

The <regex> in <src_field> can be used if you already have fields extracted (e.g. at index time, or with preceding EXTRACT items). You can then apply a further extract on those previously extracted fields. For example when a log contains some header fields and then a message, but that message also contains some details. You can then define an extraction that first gets the header and message fields and then a second extraction that takes the message field as input and extracts further details from it.

View solution in original post

0 Karma

FrankVl
Ultra Champion

The data goes into the field as you label it in the capture group. So EXTRACT-TEST1,TEST2 = ^(?m)= USER : (?P<USER>\w+) AAA = JOB : (?P<JOB>\w+) puts the data in the USER and JOB fields.

The <regex> in <src_field> can be used if you already have fields extracted (e.g. at index time, or with preceding EXTRACT items). You can then apply a further extract on those previously extracted fields. For example when a log contains some header fields and then a message, but that message also contains some details. You can then define an extraction that first gets the header and message fields and then a second extraction that takes the message field as input and extracts further details from it.

View solution in original post

0 Karma

edoardo_vicendo
Communicator

Hi FrankVI,

Thanks a lot for your answer, may you just confirm me if the following regex is more efficient in term of performance:

EXTRACT-TEST1,TEST2 = ^(?m)= USER      : (?P<USER>\w+) AAA = JOB      : (?P<JOB>\w+)

compared to the below one that is split in two different regex:

 EXTRACT-USER = ^(?m)= USER      : (?P<USER>\w+)
 EXTRACT-JOB  = ^(?m)= USER      : \w+ AAA = JOB      : (?P<JOB>\w+)

Thanks a lot,
Edoardo

0 Karma

FrankVl
Ultra Champion

I would guess so, but I don't know enough of the nitty gritty technical details of how all that regex stuff works under the hood to give you an authoritative answer on that.

0 Karma