Splunk Search

How to extract with rex from the beginning of a string to delimiter or the end of line?

pm771
Communicator

How do I write a rex command to extract from up to a particular delimiter (such as comma) or (if there is no delimiter) to the end of string?

I thought of something like rex field=TEXT "(?<error>.+)(\,|$)" but it did not work.

For example:
- If TEXT is 12A-,4XYZresult should be 12A-(up to ,)
- If TEXT is 567+4ABC result should be 567+4ABC (the entire string)

Tags (2)
1 Solution

acharlieh
Influencer

I would have your match simply be anchored at the beginning and match anything that's not your delimiter character e.g.:

^(?<error>[^,]++)

The problem with your existing regular expression, is that . matches any string and + matches greedily, so .+ consumes the entire string first, and then it checks for either a comma or the end of the string, because it's at the end of the string, must be a successful match (despite containing delimiters).

a second option would be to use a lazy quantifier e.g.

^(?<error>.+?)(?:[,]|$)

but of course that takes more steps of processing with these particular test cases.

A regular expression debugger like: https://regex101.com can help you see the steps of processing your regular expression against test cases.

View solution in original post

acharlieh
Influencer

I would have your match simply be anchored at the beginning and match anything that's not your delimiter character e.g.:

^(?<error>[^,]++)

The problem with your existing regular expression, is that . matches any string and + matches greedily, so .+ consumes the entire string first, and then it checks for either a comma or the end of the string, because it's at the end of the string, must be a successful match (despite containing delimiters).

a second option would be to use a lazy quantifier e.g.

^(?<error>.+?)(?:[,]|$)

but of course that takes more steps of processing with these particular test cases.

A regular expression debugger like: https://regex101.com can help you see the steps of processing your regular expression against test cases.

pm771
Communicator

It seems that in your 2nd solution just (?<error>.+?)([,]|$) would be enough.

Am I missing something here?

Thank you.

0 Karma

acharlieh
Influencer

You're correct in that +? is the lazy 1 or more quantifier that I was talking about and is enough for the problem at hand... The (?:x|y) syntax denotes a non-capturing group with two options x and y. Since you don't care to capture either the comma nor the end of line character, by habit, I typically mark such as non-capturing since there's no need for your regex processor to spend time keeping the value of that group around. There are some other group syntaxes (atomic groups for example) that have other characteristics in terms of what they do vs performance

sloshburch
Splunk Employee
Splunk Employee

https://regexr.com/ as well as https://regex101.com/ are really great!

0 Karma
Get Updates on the Splunk Community!

What You Read The Most: Splunk Lantern’s Most Popular Articles!

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Announcing the General Availability of Splunk Enterprise Security 8.1!

We are pleased to announce the general availability of Splunk Enterprise Security 8.1. Splunk becomes the only ...

Developer Spotlight with William Searle

The Splunk Guy: A Developer’s Path from Web to Cloud William is a Splunk Professional Services Consultant with ...