Solved: Re: How to extract with rex from the beginning of ...

pm771 · ‎07-02-2018

How do I write a rex command to extract from up to a particular delimiter (such as comma) or (if there is no delimiter) to the end of string?

I thought of something like rex field=TEXT "(?<error>.+)(\,|$)" but it did not work.

For example:
- If TEXT is 12A-,4XYZresult should be 12A-(up to ,)
- If TEXT is 567+4ABC result should be 567+4ABC (the entire string)

acharlieh · ‎07-02-2018

I would have your match simply be anchored at the beginning and match anything that's not your delimiter character e.g.:

^(?<error>[^,]++)

The problem with your existing regular expression, is that . matches any string and + matches greedily, so .+ consumes the entire string first, and then it checks for either a comma or the end of the string, because it's at the end of the string, must be a successful match (despite containing delimiters).

a second option would be to use a lazy quantifier e.g.

^(?<error>.+?)(?:[,]|$)

but of course that takes more steps of processing with these particular test cases.

A regular expression debugger like: https://regex101.com can help you see the steps of processing your regular expression against test cases.

View solution in original post

acharlieh · ‎07-02-2018

I would have your match simply be anchored at the beginning and match anything that's not your delimiter character e.g.:

^(?<error>[^,]++)

The problem with your existing regular expression, is that . matches any string and + matches greedily, so .+ consumes the entire string first, and then it checks for either a comma or the end of the string, because it's at the end of the string, must be a successful match (despite containing delimiters).

a second option would be to use a lazy quantifier e.g.

^(?<error>.+?)(?:[,]|$)

but of course that takes more steps of processing with these particular test cases.

A regular expression debugger like: https://regex101.com can help you see the steps of processing your regular expression against test cases.

pm771 · ‎07-03-2018

It seems that in your 2nd solution just (?<error>.+?)([,]|$) would be enough.

Am I missing something here?

Thank you.

acharlieh · ‎07-03-2018

You're correct in that +? is the lazy 1 or more quantifier that I was talking about and is enough for the problem at hand... The (?:x|y) syntax denotes a non-capturing group with two options x and y. Since you don't care to capture either the comma nor the end of line character, by habit, I typically mark such as non-capturing since there's no need for your regex processor to spend time keeping the value of that group around. There are some other group syntaxes (atomic groups for example) that have other characteristics in terms of what they do vs performance

sloshburch · ‎07-03-2018

https://regexr.com/ as well as https://regex101.com/ are really great!

How to extract with rex from the beginning of a string to delimiter or the end of line?

Index This | What did the zero say to the eight?

Splunk Observability Cloud's AI Assistant in Action Series: Onboarding New Hires & ...

Now Playing: Splunk Education Summer Learning Premieres

Are you a member of the Splunk Community?

How to extract with rex from the beginning of a string to delimiter or the end of line?

Index This | What did the zero say to the eight?

Splunk Observability Cloud's AI Assistant in Action Series: Onboarding New Hires & ...

Now Playing: Splunk Education Summer Learning Premieres