I need some help trying to parse a log that may have something like the following:
192.168.x.x process: field_a (value_1, value_2,...)
Where value_1
and value_2
(and so on) are all values for field_a
. You can have either 1 value to up to about 6 values for field_a
in a single log. An example would be:
session_state (SYN, ACK)
or
session_state (SYN)
Any help would be nice. Right now I am limited to reporting everything between the parenthesis as the value for field_a
.
I got 2 options:
props.conf
[my_sourcetype]
# https://regex101.com/r/bQ1kK6/1
EXTRACT-0 = session_state \(([^\)]+)\)
EVAL-session_state = split(session_state,",")
Or
props.conf
[my_sourcetype]
# https://regex101.com/r/bQ1kK6/1
EXTRACT-0 = session_state \(([^\)]+)\)
REPORT-0 = session_state_mv
transforms.conf
[session_state_mv]
SOURCE_KEY = session_state
REGEX = ([A-Z]+)
MV_ADD = true
To go off what Satoshi was showing you, but putting it into all rex in the search, here is what goes on. First you extract the SESSION_STATE using the rex command, then you take put SESSION_STATE through eval with the splunk command delimited by a command then you have a multi-valued field.
index=_internal | head 1 | eval RAW="192.168.x.x process: field_a (value_1, value_2,value_3,value_4,value_5,value_6)" | rex field=RAW "\((?<SESSION_STATE>[^)]+)" | eval SESSION_VALUES=split(SESSION_STATE,",") | table RAW,SESSION_STATE,SESSION_VALUES
I got 2 options:
props.conf
[my_sourcetype]
# https://regex101.com/r/bQ1kK6/1
EXTRACT-0 = session_state \(([^\)]+)\)
EVAL-session_state = split(session_state,",")
Or
props.conf
[my_sourcetype]
# https://regex101.com/r/bQ1kK6/1
EXTRACT-0 = session_state \(([^\)]+)\)
REPORT-0 = session_state_mv
transforms.conf
[session_state_mv]
SOURCE_KEY = session_state
REGEX = ([A-Z]+)
MV_ADD = true
Thanks. I used option 1 and this worked. Thanks again!
Assuming that the values are alpha-numeric only, you have at least one value and a max of six, the parens are only found around the values list in the event, and you are wanting to do this at search time with a rex
command, this should work for you:
<your search> | rex field=_raw "\((?P<value_1>\w+)(,\s+(?P<value_2>\w+)?(,\s+(?P<value_3>\w+)?(,\s+(?P<value_4>\w+)?(,\s+(?P<value_5>\w+)?(,\s+(?P<value_6>\w+)?\)"
A couple key notes:
- I am adding this regex as an extraction in props.conf for a TA.
- There can be multiple words for a single value. (ie. signature ( Block List, Threat List, Traffic Misuse))
- The values are all words, no numbers.
I tried something similar to the above in a props extraction but it creates a field extraction for each 'value_x' so i would end up with a bunch of fields. I guess I can always rename them in search or alias them all to the same field. The real challenge seems to be only parsing the number of values available. If there is a single value, the search above, as well as my other tries, continue to parse the next bits of info as the next value_2. Trying to get it to stop at the ')' or if a comma, continue with value_2, then repeat until the ')' comes. The actual log continues past this 'field_a'. I figure if I can get the extractions working on the one, I can replicate to the other fields etc. Here is an example of what I am talking about.
ex log: time_stamp 192.168.x.x process: field_a (value_1, value_2,...) , field_b (value_1, value_2,...), field_c (value)
Thanks for any help.
You said you're able to show everything with brackets as value of field_a. So how does you fields look
Sample log
time_stamp 192.168.x.x process: field_a (value_1, value_2) , field_b (value_3, value_4,value_5), field_c (value_5)
Is it something like this
_time field_a field_b field_c
time_stamp "value_1,value_2" "value_3,value_4,value_5" "value_6"
Yes. When I get everything inside the parenthesis, it looks like how you describe above.