topic Re: Basic Regex Field Extraction in Splunk Search

Basic Regex Field Extraction

JPrictoe — Wed, 28 Mar 2018 17:37:13 GMT

I want to extract from "Mozilla" to the closed quotes, pulling everything up to and including 27.0", how come my regex (\s.+") goes all the way to the final quote on the other side of the word analytics. I know the regex is poor, I'm just trying to get the concept.

 "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0" OBSERVED "Web Ads/Analytics"

Re: Basic Regex Field Extraction

cpetterborg — Wed, 28 Mar 2018 17:48:05 GMT

The .+ at the end of your regex is going to go all the way to the end. This should work for the regex:

Mozilla[^)]*\)

It will include the paren at the end as well, so you can decide if you want to include that.

Re: Basic Regex Field Extraction

elliotproebstel — Wed, 28 Mar 2018 17:56:48 GMT

The reason your regex is capturing more than you intend is because regexes are greedy by default. So (\s.+") will match until the last double-quote it finds. Here's a revised regex that should work for you:

^\"[^"]+\"

This will look for the double-quotes at the start of the line, collect everything that's not a double-quote followed by the next instance of double-quotes. That prevents the greedy nature from kicking in.

Re: Basic Regex Field Extraction

paulbannister — Thu, 29 Mar 2018 13:59:35 GMT

"Mozilla\/(?P<FIELD_NAME>[^"]+)

As above with the REGEX being greedy, the attached regex will also generate the name for your new field.... just replace "FIELD_NAME" with the desired name of your new field

As a side note https://regex101.com/ is a fantastic place to experiment with/hone your REGEX skills