I want to extract from "Mozilla" to the closed quotes, pulling everything up to and including 27.0", how come my regex (\s.+") goes all the way to the final quote on the other side of the word analytics. I know the regex is poor, I'm just trying to get the concept.
"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0" OBSERVED "Web Ads/Analytics"
"Mozilla\/(?P<FIELD_NAME>[^"]+)
As above with the REGEX being greedy, the attached regex will also generate the name for your new field.... just replace "FIELD_NAME" with the desired name of your new field
As a side note https://regex101.com/ is a fantastic place to experiment with/hone your REGEX skills
The reason your regex is capturing more than you intend is because regexes are greedy by default. So (\s.+")
will match until the last double-quote it finds. Here's a revised regex that should work for you:
^\"[^"]+\"
This will look for the double-quotes at the start of the line, collect everything that's not a double-quote followed by the next instance of double-quotes. That prevents the greedy nature from kicking in.
The .+
at the end of your regex is going to go all the way to the end. This should work for the regex:
Mozilla[^)]*\)
It will include the paren at the end as well, so you can decide if you want to include that.