Splunk Search

Strange behavior in regex extraction

edrivera3
Builder

Hi
I want to extract the multi-value field "step" and this is how my event looks like:

STEP: 1005
RESULT: PASS
ACTUAL:
RETRIES: 1

STEP: 1006
RESULT: PASS
ACTUAL:

STEP: 1009
RESULT: PASS
EXPECTED: 90.5
ACTUAL: 91.0
STEP: 1011
RESULT: PASS
ACTUAL:

STEP: 1015
RESULT: PASS
ACTUAL:

I have the following regex:
... | rex "(?<step>STEP:\s{6}\d+[\w\W\n]+?)STEP:\s{6}" max_match=0

But for some strange reason this regex skips every other step so I only extracted steps:1005, 1009, and 1015. I believe the problem is associated with the way the regex reads. After a step is extracted, the regex already passed the "STEP:\s{6}" of the next step so the regex cannot find a pattern there and it continues forward until reach the next step.

This is what I extracted in the field "step":
STEP: 1005
RESULT: PASS
ACTUAL:

RETRIES: 1


STEP: 1009
RESULT: PASS
EXPECTED: 90.5
ACTUAL: 91.0


STEP: 1015
RESULT: PASS
ACTUAL:

As you can see I am catching the correct pattern with this regex. Please let me know what I could do to extract all the values for this field.

Tags (3)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

Rex was skipping STEPs because your regex string called for two instances of "STEP" to constitute a match. Using lookahead helps. Regex101.com works with this regex string: (?STEP:[\w\W\n]+?)(?=STEP|$).

---
If this reply helps you, Karma would be appreciated.

View solution in original post

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Rex was skipping STEPs because your regex string called for two instances of "STEP" to constitute a match. Using lookahead helps. Regex101.com works with this regex string: (?STEP:[\w\W\n]+?)(?=STEP|$).

---
If this reply helps you, Karma would be appreciated.
0 Karma

edrivera3
Builder

Thank you. That's what I need it a lookahead! This is my regex now:
(?<step>[\w\W\n]+?)(?=STEP)

0 Karma

richgalloway
SplunkTrust
SplunkTrust

That regex will probably miss the last STEP. That's why my regex string included |$.

---
If this reply helps you, Karma would be appreciated.
0 Karma

edrivera3
Builder

You are right I am missing the last STEP, but when I include "|$" I only extract the first line:
This is what I extracted in the field "step":
STEP: 1005
STEP: 1006
STEP: 1009
STEP: 1011
STEP: 1015

So I rather miss the last step than missing info from all other steps. Do you have any idea how to avoid this?

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Examine your data closely to see if there is anything else you can use as a terminator.

---
If this reply helps you, Karma would be appreciated.
0 Karma

edrivera3
Builder

I just realize that I can reduce my regex to simply:
... | rex "(?<step>STEP:[\w\W\n]+?)STEP:" max_match=0

This regex gives me the same results, so it doesn't change anything. 😞

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Event Series: Telemetry Pipeline Management

Balancing Scale and Spend: Gaining Control Over High-Volume Metrics in Splunk Observability Cloud As ...

Kick the Tires Before You Commit: A Hands-On Tour of the Splunk Observability Cloud ...

Evaluating an enterprise observability platform usually goes like this: fill out a form, get a free trial with ...

Deep insights, no barriers: Splunk Observability Cloud Free Edition

As software delivery cycles continue to accelerate, observability shouldn’t be a luxury — it should be a ...