How can I write this regex for rex to extract mult...

bowesmana · ‎02-06-2023

I'm trying to parse saved searches that contain a bunch of eval statements that do this sort of logic

| eval var=case(
  a,b,
  c,d,
  e,f)
| eval var2=case(
  match(x, "z|y|z"), 1,
  match(x, "a|b|c"), 2)
| eval...

I have the search string from the rest api response and am trying to extract all the LHS=RHS statements with

| rex field=search max_match=0 "(?s)\|\s*eval (?<field>\w+)=(?<data>.*?)"

The captures all the fields in <field> nicely, i.e. var and var2 (in this example because of the non-greedy ?), but I am struggling with capturing <data> in that the data is multi-line and if I don't use non-greedy(?) then I only get ONE field returned and data is the remainder of the search string, i.e. greedy (.*)

I can't use [^|]* (effectively much the same) as the eval statements may contain pipes | so I want to extract up to the next \n|\s?eval

I've been banging around with regex101 but just can't figure out the syntax to get this to work.

Any ideas?

ITWhisperer · ‎02-07-2023

Try using lookaheads (also, you might want some more white spaces with quantifiers)

(?s)\|\s*eval\s+(?<field>\w+)\s*=\s*(?<data>.*?)(?=\s+\|)

https://regex101.com/r/ZkTPpi/1

This doesn't cope with no whitespace between end of eval and next pipe symbol though, but it might get you closer to what you need.

ITWhisperer · ‎02-07-2023

This is a little better as it deals with quoted strings but it would still fail for escaped quotes within those strings

(?s)\|\s*eval\s+(?<field>\w+)\s*=\s*(?<data>([^\"]|\"[^\"]+\"[^\"])+?)(?=\s*\|)

ITWhisperer · ‎02-07-2023

Hopefully, this deals with escaped quotes within the quoted strings as well.

https://regex101.com/r/fI0sii/1

(?s)\|\s*eval\s+(?<field>\w+)\s*=\s*(?<data>([^\"]|\"(\\\\|\\\"|[^\"])+\"[^\"])+?)(?=\s*\|)

bowesmana · ‎02-07-2023

Nice @ITWhisperer ! Interesting, this one came up with has exceeded the configured depth_limit, consider raising the value in limits.conf.

However, on a less complex one it almost works. It doesn't pick up the last one - example below

| makeresults
| fields - _time
| eval search="| eval is_priv=case(
    true(), coalesce(is_priv, 0)) 

| eval is_jump=case(
    true(), coalesce(is_jump, 0)) 

| eval is_important_data_repo=case(
    true(), coalesce(is_important_data_repo, 0)) 

| eval is_sensitive_data=case(
    true(), coalesce(is_sensitive_data, 0))"
| rex field=search max_match=0 "(?s)\|\s*eval\s+(?<field>\w+)\s*=\s*(?<data>([^\"]|\"[^\"]+\"[^\"])+?)(?=\s*\|)"
| eval tmp=mvzip(field, data, "######")
| fields tmp
| mvexpand tmp
| eval field=mvindex(split(tmp, "######"), 0)
| eval data=mvindex(split(tmp, "######"), 1)

I can stick a dummy "| eval" on the end to make it work - but I'll have to stare at this regex for a while to understand it - thanks

ITWhisperer · ‎02-07-2023

OK try this

| makeresults
| fields - _time
| eval search="| eval is_priv=case(
    true(), coalesce(is_priv, 0)) 

| eval is_jump=case(
    true(), coalesce(is_jump, 0)) 

| eval is_important_data_repo=case(
    true(), coalesce(is_important_data_repo, 0)) 

| eval is_sensitive_data=case(
    true(), coalesce(is_sensitive_data, 0))

| eval is_sensitive_data=case(
    true(), coalesce(is_sensitive_data, 0))
| eval var=case(
  a,b,
  c,d,
  e,f)
| eval var2=case(
  match(x, \"z|y|z\"), 1,
  match(x, \"a|b|c\"), 2)
| eval var3=case(
  match(x, \"z\\\"|y|z\"), 1,
  match(x, \"a|b|c\"), 2)
| stats
| table"
| rex field=search max_match=0 "(?s)\|(?<line>.*?eval([^\"]|\"(\\\\\"|[^\"])+?\")*?)(?=\s*\|)"
| fields - search
| mvexpand line
| rex field=line "(?s)\s*eval\s+(?<field>\w+)\s*=\s*(?<data>.*)"

I extended your sample data to include strings and escaped quotes (I also added trailing commands because I figured it was unlikely that your searches would finish with an eval, although this is possible and as you say, you could just add an extra pipe to the string before the rex).

The first rex is extracting the lines with eval in, which can then be mvexpanded to avoid the mvzip/split manipulations

The second rex is extracting the fields.

The complex regex is looking for anything that isn't double quotes [^\"], or if it is in double quotes \", it includes escaped quotes \\\\\" or anything which isn't double quotes [^\"] up to the next (unescaped) double quotes \", finishing off with a lookahead to ensure there is a pipe with or without a preceding whitespace.

bowesmana · ‎02-07-2023

@ITWhisperer

Thanks for the detailed explanation and regex. It does work, but the only issue is that the eval statements are too long for the expression depth - limits.conf as max depth of 1000 and some of these evals are well over 1000 characters - this is one example. but not the longest by any means.

| makeresults
| fields - _time
| eval search="| eval os=case(
    (match(lower(os_ident),\"windows\") AND (match(lower(os_ident),\"10\") OR match(lower(os_ident),\"7\")) AND match(lower(entity),\"^e\d+\")),\"windows\",
    (match(lower(os_ident),\"windows\") AND (match(lower(os_ident),\"10\") OR match(lower(os_ident),\"7\")) AND match(lower(entity),\"^\w{4}\-(?:wv|rd|pw)\")),\"windows\",
    (match(lower(os_ident),\"linux\") AND match(splunk_index,\"prod_osnix\")), \"linux\",
    (match(lower(os_ident),\"windows\") AND match(lower(os_ident),\"server\")),\"windows\",
    (match(lower(os_ident),\"windows\")),\"windows\",
    (match(lower(os_ident),\"printer\")),\"printer\",
    match(entity,\"^e\d+\"),\"windows\",
    match(entity,\"^\w{4}\-(?:wv|rd|pw)\"),\"windows\",
    match(splunk_index,\"oswin\"),\"windows\",
    match(splunk_index,\"osnix\"),\"linux\",
    match(lower(os_ident),\"linux\"),\"linux\",
    match(lower(os_ident),\"mac\s*os\"),\"macintosh\",
    (match(entity,\"^\w{4}\-(?:fr|sw|rt)\d+l\") AND match(lower(os_ident),\"junos|juniper\")),\"juniper junos\",
    (match(entity,\"^\w{4}\-sw\d\w\d+[ab]?l\") AND match(lower(os_ident),\"junos|juniper\")),\"juniper junos\",
    (match(entity,\"^\w{4}\-(?:fr|sw|rt)\d+l\") AND match(lower(os_ident),\"cisco\")),\"cisco ios\",
    (match(entity,\"\w{4}\-sw\d\w\d+[ab]?l\") AND match(lower(os_ident),\"cisco\")),\"cisco ios\",
    isnotnull(os_ident),\"other\",
    true(),null()) 
| stats
| table"
| rex field=search max_match=0 "(?s)\|(?<line>.*?eval([^\"]|\"(\\\\\"|[^\"])+?\")*?)(?=\s*\|)"
| fields - search
| mvexpand line
| rex field=line "(?s)\s*eval\s+(?<field>\w+)\s*=\s*(?<data>.*)"

So, I ended up using a different approach, first to join all the lines together and then split to MV and expand.

Using streamstats I then remove all the non-eval sections and then stitch each eval back into its own single value string and stats them all back together again with list(x). Have to do a two stage process as there are more than 100 lines for list unless you join each eval first.

I guess that's the beauty of Splunk/SPL - you can always get where you're going - this doesn't have to be efficient and works. All "end of eval" is denoted by a starting pipe at the beginning of the line.

| rest /servicesNS/-/app_name/saved/searches 
| fields title description author eai:acl.owner updated search eai:acl.sharing disabled is_scheduled cron_schedule
| where is_scheduled=1 AND disabled=0
| eval search=split(replace(search, "\n", "!@#!@#"), "!@#!@#")
| mvexpand search
| where !match(search, "^[\s]*`{3}.*`{3}")
| streamstats c as ev reset_before="match(search, \"^\|\")" by title
| where ev>1 OR (ev=1 AND match(search, "\|\s?eval"))
| streamstats values(ev) as items reset_before="ev=1" by title
| search items=1
| streamstats count(eval(ev=1)) as group by title
| fields - items ev 
| stats values(*) as * list(search) as search by title group
| sort title group
| eval search=mvjoin(search, "
")
| rex field=search "(?s)\|\s*eval\s+(?<field>\w+)\s*=\s*(?<data>.*)"
| fields - group search
| stats values(*) as * list(field) as field list(data) as data by title

I know there is probably an optimisation in there somewhere, but it's not a critical path, just a support dashboard, so that will do.

How can I write this regex for rex to extract multiple matches?

field extraction

regex

rex

Welcome to the Splunk Community!

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Adoption of RUM and APM at Splunk