I need some help trying to parse a log that may have something like the following:
192.168.x.x process: field_a (value_1, value_2,...)
value_2 (and so on) are all values for
field_a. You can have either 1 value to up to about 6 values for
field_a in a single log. An example would be:
sessionstate (SYN, ACK)
Any help would be nice. Right now I am limited to reporting everything between the parenthesis as the value for
Assuming that the values are alpha-numeric only, you have at least one value and a max of six, the parens are only found around the values list in the event, and you are wanting to do this at search time with a
rex command, this should work for you:
<your search> | rex field=_raw "\((?P<value_1>\w+)(,\s+(?P<value_2>\w+)?(,\s+(?P<value_3>\w+)?(,\s+(?P<value_4>\w+)?(,\s+(?P<value_5>\w+)?(,\s+(?P<value_6>\w+)?\)"
A couple key notes:
- I am adding this regex as an extraction in props.conf for a TA.
- There can be multiple words for a single value. (ie. signature ( Block List, Threat List, Traffic Misuse))
- The values are all words, no numbers.
I tried something similar to the above in a props extraction but it creates a field extraction for each 'valuex' so i would end up with a bunch of fields. I guess I can always rename them in search or alias them all to the same field. The real challenge seems to be only parsing the number of values available. If there is a single value, the search above, as well as my other tries, continue to parse the next bits of info as the next value2. Trying to get it to stop at the ')' or if a comma, continue with value2, then repeat until the ')' comes. The actual log continues past this 'fielda'. I figure if I can get the extractions working on the one, I can replicate to the other fields etc. Here is an example of what I am talking about.
ex log: timestamp 192.168.x.x process: fielda (value1, value2,...) , fieldb (value1, value2,...), fieldc (value)
Thanks for any help.
You said you're able to show everything with brackets as value of field_a. So how does you fields look
time_stamp 192.168.x.x process: field_a (value_1, value_2) , field_b (value_3, value_4,value_5), field_c (value_5)
Is it something like this
_time field_a field_b field_c time_stamp "value_1,value_2" "value_3,value_4,value_5" "value_6"
Yes. When I get everything inside the parenthesis, it looks like how you describe above.
I got 2 options:
[my_sourcetype] # https://regex101.com/r/bQ1kK6/1 EXTRACT-0 = session_state \(([^\)]+)\) EVAL-session_state = split(session_state,",")
[my_sourcetype] # https://regex101.com/r/bQ1kK6/1 EXTRACT-0 = session_state \(([^\)]+)\) REPORT-0 = session_state_mv
[session_state_mv] SOURCE_KEY = session_state REGEX = ([A-Z]+) MV_ADD = true
To go off what Satoshi was showing you, but putting it into all rex in the search, here is what goes on. First you extract the SESSIONSTATE using the rex command, then you take put SESSIONSTATE through eval with the splunk command delimited by a command then you have a multi-valued field.
index=_internal | head 1 | eval RAW="192.168.x.x process: field_a (value_1, value_2,value_3,value_4,value_5,value_6)" | rex field=RAW "\((?<SESSION_STATE>[^)]+)" | eval SESSION_VALUES=split(SESSION_STATE,",") | table RAW,SESSION_STATE,SESSION_VALUES