Splunk Search

How to extract with rex from the beginning of a string to delimiter or the end of line?

pm771
Communicator

How do I write a rex command to extract from up to a particular delimiter (such as comma) or (if there is no delimiter) to the end of string?

I thought of something like rex field=TEXT "(?<error>.+)(\,|$)" but it did not work.

For example:
- If TEXT is 12A-,4XYZresult should be 12A-(up to ,)
- If TEXT is 567+4ABC result should be 567+4ABC (the entire string)

Tags (2)
1 Solution

acharlieh
Influencer

I would have your match simply be anchored at the beginning and match anything that's not your delimiter character e.g.:

^(?<error>[^,]++)

The problem with your existing regular expression, is that . matches any string and + matches greedily, so .+ consumes the entire string first, and then it checks for either a comma or the end of the string, because it's at the end of the string, must be a successful match (despite containing delimiters).

a second option would be to use a lazy quantifier e.g.

^(?<error>.+?)(?:[,]|$)

but of course that takes more steps of processing with these particular test cases.

A regular expression debugger like: https://regex101.com can help you see the steps of processing your regular expression against test cases.

View solution in original post

acharlieh
Influencer

I would have your match simply be anchored at the beginning and match anything that's not your delimiter character e.g.:

^(?<error>[^,]++)

The problem with your existing regular expression, is that . matches any string and + matches greedily, so .+ consumes the entire string first, and then it checks for either a comma or the end of the string, because it's at the end of the string, must be a successful match (despite containing delimiters).

a second option would be to use a lazy quantifier e.g.

^(?<error>.+?)(?:[,]|$)

but of course that takes more steps of processing with these particular test cases.

A regular expression debugger like: https://regex101.com can help you see the steps of processing your regular expression against test cases.

pm771
Communicator

It seems that in your 2nd solution just (?<error>.+?)([,]|$) would be enough.

Am I missing something here?

Thank you.

0 Karma

acharlieh
Influencer

You're correct in that +? is the lazy 1 or more quantifier that I was talking about and is enough for the problem at hand... The (?:x|y) syntax denotes a non-capturing group with two options x and y. Since you don't care to capture either the comma nor the end of line character, by habit, I typically mark such as non-capturing since there's no need for your regex processor to spend time keeping the value of that group around. There are some other group syntaxes (atomic groups for example) that have other characteristics in terms of what they do vs performance

sloshburch
Splunk Employee
Splunk Employee

https://regexr.com/ as well as https://regex101.com/ are really great!

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...