Splunk Search

How to extract with rex from the beginning of a string to delimiter or the end of line?

pm771
Communicator

How do I write a rex command to extract from up to a particular delimiter (such as comma) or (if there is no delimiter) to the end of string?

I thought of something like rex field=TEXT "(?<error>.+)(\,|$)" but it did not work.

For example:
- If TEXT is 12A-,4XYZresult should be 12A-(up to ,)
- If TEXT is 567+4ABC result should be 567+4ABC (the entire string)

Tags (2)
1 Solution

acharlieh
Influencer

I would have your match simply be anchored at the beginning and match anything that's not your delimiter character e.g.:

^(?<error>[^,]++)

The problem with your existing regular expression, is that . matches any string and + matches greedily, so .+ consumes the entire string first, and then it checks for either a comma or the end of the string, because it's at the end of the string, must be a successful match (despite containing delimiters).

a second option would be to use a lazy quantifier e.g.

^(?<error>.+?)(?:[,]|$)

but of course that takes more steps of processing with these particular test cases.

A regular expression debugger like: https://regex101.com can help you see the steps of processing your regular expression against test cases.

View solution in original post

acharlieh
Influencer

I would have your match simply be anchored at the beginning and match anything that's not your delimiter character e.g.:

^(?<error>[^,]++)

The problem with your existing regular expression, is that . matches any string and + matches greedily, so .+ consumes the entire string first, and then it checks for either a comma or the end of the string, because it's at the end of the string, must be a successful match (despite containing delimiters).

a second option would be to use a lazy quantifier e.g.

^(?<error>.+?)(?:[,]|$)

but of course that takes more steps of processing with these particular test cases.

A regular expression debugger like: https://regex101.com can help you see the steps of processing your regular expression against test cases.

pm771
Communicator

It seems that in your 2nd solution just (?<error>.+?)([,]|$) would be enough.

Am I missing something here?

Thank you.

0 Karma

acharlieh
Influencer

You're correct in that +? is the lazy 1 or more quantifier that I was talking about and is enough for the problem at hand... The (?:x|y) syntax denotes a non-capturing group with two options x and y. Since you don't care to capture either the comma nor the end of line character, by habit, I typically mark such as non-capturing since there's no need for your regex processor to spend time keeping the value of that group around. There are some other group syntaxes (atomic groups for example) that have other characteristics in terms of what they do vs performance

sloshburch
Splunk Employee
Splunk Employee

https://regexr.com/ as well as https://regex101.com/ are really great!

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Brett Adams

In our third Spotlight feature, we're excited to shine a light on Brett—a Splunk consultant, innovative ...

Index This | What can you do to make 55,555 equal 500?

April 2025 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Say goodbye to manually analyzing phishing and malware threats with Splunk Attack ...

In today’s evolving threat landscape, we understand you’re constantly bombarded with phishing and malware ...