Splunk Search

How to write the regex to parse comma separated values for a single field in a log?

sswansonchtr
Path Finder

I need some help trying to parse a log that may have something like the following:

 192.168.x.x process: field_a (value_1, value_2,...)

Where value_1 and value_2 (and so on) are all values for field_a. You can have either 1 value to up to about 6 values for field_a in a single log. An example would be:

session_state (SYN, ACK)
or
session_state (SYN)

Any help would be nice. Right now I am limited to reporting everything between the parenthesis as the value for field_a.

1 Solution

skawasaki_splun
Splunk Employee
Splunk Employee

I got 2 options:

props.conf

 [my_sourcetype]
 # https://regex101.com/r/bQ1kK6/1
 EXTRACT-0 = session_state \(([^\)]+)\)
 EVAL-session_state = split(session_state,",")

Or

props.conf

 [my_sourcetype]
 # https://regex101.com/r/bQ1kK6/1
 EXTRACT-0 = session_state \(([^\)]+)\)
 REPORT-0 = session_state_mv

transforms.conf

 [session_state_mv]
 SOURCE_KEY = session_state
 REGEX = ([A-Z]+)
 MV_ADD = true

View solution in original post

dmaislin_splunk
Splunk Employee
Splunk Employee

To go off what Satoshi was showing you, but putting it into all rex in the search, here is what goes on. First you extract the SESSION_STATE using the rex command, then you take put SESSION_STATE through eval with the splunk command delimited by a command then you have a multi-valued field.

index=_internal | head 1 | eval RAW="192.168.x.x process: field_a (value_1, value_2,value_3,value_4,value_5,value_6)" | rex field=RAW "\((?<SESSION_STATE>[^)]+)" | eval SESSION_VALUES=split(SESSION_STATE,",") | table RAW,SESSION_STATE,SESSION_VALUES

alt text

skawasaki_splun
Splunk Employee
Splunk Employee

I got 2 options:

props.conf

 [my_sourcetype]
 # https://regex101.com/r/bQ1kK6/1
 EXTRACT-0 = session_state \(([^\)]+)\)
 EVAL-session_state = split(session_state,",")

Or

props.conf

 [my_sourcetype]
 # https://regex101.com/r/bQ1kK6/1
 EXTRACT-0 = session_state \(([^\)]+)\)
 REPORT-0 = session_state_mv

transforms.conf

 [session_state_mv]
 SOURCE_KEY = session_state
 REGEX = ([A-Z]+)
 MV_ADD = true

sswansonchtr
Path Finder

Thanks. I used option 1 and this worked. Thanks again!

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

Assuming that the values are alpha-numeric only, you have at least one value and a max of six, the parens are only found around the values list in the event, and you are wanting to do this at search time with a rex command, this should work for you:

<your search> | rex field=_raw "\((?P<value_1>\w+)(,\s+(?P<value_2>\w+)?(,\s+(?P<value_3>\w+)?(,\s+(?P<value_4>\w+)?(,\s+(?P<value_5>\w+)?(,\s+(?P<value_6>\w+)?\)"

sswansonchtr
Path Finder

A couple key notes:
- I am adding this regex as an extraction in props.conf for a TA.
- There can be multiple words for a single value. (ie. signature ( Block List, Threat List, Traffic Misuse))
- The values are all words, no numbers.

I tried something similar to the above in a props extraction but it creates a field extraction for each 'value_x' so i would end up with a bunch of fields. I guess I can always rename them in search or alias them all to the same field. The real challenge seems to be only parsing the number of values available. If there is a single value, the search above, as well as my other tries, continue to parse the next bits of info as the next value_2. Trying to get it to stop at the ')' or if a comma, continue with value_2, then repeat until the ')' comes. The actual log continues past this 'field_a'. I figure if I can get the extractions working on the one, I can replicate to the other fields etc. Here is an example of what I am talking about.

ex log: time_stamp 192.168.x.x process: field_a (value_1, value_2,...) , field_b (value_1, value_2,...), field_c (value)

Thanks for any help.

0 Karma

somesoni2
Revered Legend

You said you're able to show everything with brackets as value of field_a. So how does you fields look
Sample log

time_stamp 192.168.x.x process: field_a (value_1, value_2) , field_b (value_3, value_4,value_5), field_c (value_5) 

Is it something like this

_time              field_a                 field_b             field_c
time_stamp    "value_1,value_2"    "value_3,value_4,value_5"      "value_6"
0 Karma

sswansonchtr
Path Finder

Yes. When I get everything inside the parenthesis, it looks like how you describe above.

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...