Splunk Search
Highlighted

How to write the regex to parse comma separated values for a single field in a log?

Path Finder

I need some help trying to parse a log that may have something like the following:

 192.168.x.x process: field_a (value_1, value_2,...)

Where value_1 and value_2 (and so on) are all values for field_a. You can have either 1 value to up to about 6 values for field_a in a single log. An example would be:

sessionstate (SYN, ACK)
or
session
state (SYN)

Any help would be nice. Right now I am limited to reporting everything between the parenthesis as the value for field_a.

Highlighted

Re: How to write the regex to parse comma separated values for a single field in a log?

SplunkTrust
SplunkTrust

Assuming that the values are alpha-numeric only, you have at least one value and a max of six, the parens are only found around the values list in the event, and you are wanting to do this at search time with a rex command, this should work for you:

<your search> | rex field=_raw "\((?P<value_1>\w+)(,\s+(?P<value_2>\w+)?(,\s+(?P<value_3>\w+)?(,\s+(?P<value_4>\w+)?(,\s+(?P<value_5>\w+)?(,\s+(?P<value_6>\w+)?\)"
Highlighted

Re: How to write the regex to parse comma separated values for a single field in a log?

Path Finder

A couple key notes:
- I am adding this regex as an extraction in props.conf for a TA.
- There can be multiple words for a single value. (ie. signature ( Block List, Threat List, Traffic Misuse))
- The values are all words, no numbers.

I tried something similar to the above in a props extraction but it creates a field extraction for each 'valuex' so i would end up with a bunch of fields. I guess I can always rename them in search or alias them all to the same field. The real challenge seems to be only parsing the number of values available. If there is a single value, the search above, as well as my other tries, continue to parse the next bits of info as the next value2. Trying to get it to stop at the ')' or if a comma, continue with value2, then repeat until the ')' comes. The actual log continues past this 'fielda'. I figure if I can get the extractions working on the one, I can replicate to the other fields etc. Here is an example of what I am talking about.

ex log: timestamp 192.168.x.x process: fielda (value1, value2,...) , fieldb (value1, value2,...), fieldc (value)

Thanks for any help.

0 Karma
Highlighted

Re: How to write the regex to parse comma separated values for a single field in a log?

SplunkTrust
SplunkTrust

You said you're able to show everything with brackets as value of field_a. So how does you fields look
Sample log

time_stamp 192.168.x.x process: field_a (value_1, value_2) , field_b (value_3, value_4,value_5), field_c (value_5) 

Is it something like this

_time              field_a                 field_b             field_c
time_stamp    "value_1,value_2"    "value_3,value_4,value_5"      "value_6"
0 Karma
Highlighted

Re: How to write the regex to parse comma separated values for a single field in a log?

Path Finder

Yes. When I get everything inside the parenthesis, it looks like how you describe above.

0 Karma
Highlighted

Re: How to write the regex to parse comma separated values for a single field in a log?

Splunk Employee
Splunk Employee

I got 2 options:

props.conf

 [my_sourcetype]
 # https://regex101.com/r/bQ1kK6/1
 EXTRACT-0 = session_state \(([^\)]+)\)
 EVAL-session_state = split(session_state,",")

Or

props.conf

 [my_sourcetype]
 # https://regex101.com/r/bQ1kK6/1
 EXTRACT-0 = session_state \(([^\)]+)\)
 REPORT-0 = session_state_mv

transforms.conf

 [session_state_mv]
 SOURCE_KEY = session_state
 REGEX = ([A-Z]+)
 MV_ADD = true

View solution in original post

Highlighted

Re: How to write the regex to parse comma separated values for a single field in a log?

Path Finder

Thanks. I used option 1 and this worked. Thanks again!

0 Karma
Highlighted

Re: How to write the regex to parse comma separated values for a single field in a log?

Splunk Employee
Splunk Employee

To go off what Satoshi was showing you, but putting it into all rex in the search, here is what goes on. First you extract the SESSIONSTATE using the rex command, then you take put SESSIONSTATE through eval with the splunk command delimited by a command then you have a multi-valued field.

index=_internal | head 1 | eval RAW="192.168.x.x process: field_a (value_1, value_2,value_3,value_4,value_5,value_6)" | rex field=RAW "\((?<SESSION_STATE>[^)]+)" | eval SESSION_VALUES=split(SESSION_STATE,",") | table RAW,SESSION_STATE,SESSION_VALUES

alt text