topic Re: Strange behavior in regex extraction in Splunk Search

Strange behavior in regex extraction

edrivera3 — Thu, 21 May 2015 15:02:47 GMT

Hi
I want to extract the multi-value field "step" and this is how my event looks like:

STEP: 1005
RESULT: PASS
ACTUAL:
RETRIES: 1

STEP: 1006
RESULT: PASS
ACTUAL:

STEP: 1009
RESULT: PASS
EXPECTED: 90.5
ACTUAL: 91.0
STEP: 1011
RESULT: PASS
ACTUAL:

STEP: 1015
RESULT: PASS
ACTUAL:

I have the following regex:
... | rex "(?<step>STEP:\s{6}\d+[\w\W\n]+?)STEP:\s{6}" max_match=0

But for some strange reason this regex skips every other step so I only extracted steps:1005, 1009, and 1015. I believe the problem is associated with the way the regex reads. After a step is extracted, the regex already passed the "STEP:\s{6}" of the next step so the regex cannot find a pattern there and it continues forward until reach the next step.

This is what I extracted in the field "step":
STEP: 1005
RESULT: PASS
ACTUAL:

RETRIES: 1

STEP: 1009
RESULT: PASS
EXPECTED: 90.5
ACTUAL: 91.0

STEP: 1015
RESULT: PASS
ACTUAL:

As you can see I am catching the correct pattern with this regex. Please let me know what I could do to extract all the values for this field.

Re: Strange behavior in regex extraction

edrivera3 — Thu, 21 May 2015 15:08:54 GMT

I just realize that I can reduce my regex to simply:
... | rex "(?<step>STEP:[\w\W\n]+?)STEP:" max_match=0

This regex gives me the same results, so it doesn't change anything. 😞

Re: Strange behavior in regex extraction

richgalloway — Thu, 21 May 2015 15:23:14 GMT

Rex was skipping STEPs because your regex string called for two instances of "STEP" to constitute a match. Using lookahead helps. Regex101.com works with this regex string: (?STEP:[\w\W\n]+?)(?=STEP|$).

Re: Strange behavior in regex extraction

edrivera3 — Thu, 21 May 2015 15:55:32 GMT

Thank you. That's what I need it a lookahead! This is my regex now:
(?<step>[\w\W\n]+?)(?=STEP)

Re: Strange behavior in regex extraction

richgalloway — Thu, 21 May 2015 15:58:28 GMT

That regex will probably miss the last STEP. That's why my regex string included |$.

Re: Strange behavior in regex extraction

edrivera3 — Thu, 21 May 2015 17:02:59 GMT

You are right I am missing the last STEP, but when I include "|$" I only extract the first line:
This is what I extracted in the field "step":
STEP: 1005
STEP: 1006
STEP: 1009
STEP: 1011
STEP: 1015

So I rather miss the last step than missing info from all other steps. Do you have any idea how to avoid this?

Re: Strange behavior in regex extraction

richgalloway — Thu, 21 May 2015 17:14:48 GMT

Examine your data closely to see if there is anything else you can use as a terminator.