topic Re: Regex in query in Splunk Search

Regex in query

jacqu3sy — Sat, 14 Oct 2017 15:38:16 GMT

Hi,

Can anyone help with a regex to extract into a new field anything contained within raw data after a #?

For example, the following data from twitter;

opIcarus #opBlackOctober #opSacred #opMASSHACK ENGAGED https://t.co/JiWVA4kOXr

I'd like a way to extract and list all the content after each hash.

Thanks.

Re: Regex in query

gcusello — Sat, 14 Oct 2017 15:51:27 GMT

Hi jacqu3sy,
try this regex

\#(?<my_field>[^\#]*)

you can test it at https://regex101.com/r/Sk52x3/1

Bye.
Giuseppe

Re: Regex in query

jacqu3sy — Sat, 14 Oct 2017 17:18:27 GMT

Not quite. It pulls out the first #hashtag within the _raw field, but ignores the others. So in the example above, it would extract #opBlackOctober, but ignore the others.

Is there a way of extracting all? Thanks.

Re: Regex in query

elliotproebstel — Sat, 14 Oct 2017 23:43:10 GMT

I think this should do what you're looking for:
rex max_match=0 field=_raw \#(?<extracted_field>[^\# ]*)
The keys here are the max_match argument, which tells rex to not stop at the first match, and also a slight modification to the regex that @cusello suggested (by adding a space to the ignored characters). Without that modification, I believe you will get erroneous matches.

Note that the max_match argument defaults to the value of 1. Setting it to 0 makes it unlimited, but you could set it to some other specific value if you only wanted to match a certain number of instances. Here is some info in the docs:
https://docs.splunk.com/Documentation/SplunkCloud/6.6.1/SearchReference/Rex

Re: Regex in query

gcusello — Sun, 15 Oct 2017 08:04:27 GMT

Hi jacqu3sy,
try

| rex max_match=0 "\#(?<my_field>[^\#]*)"

Bye.
Giuseppe

Re: Regex in query

jacqu3sy — Sun, 15 Oct 2017 09:46:59 GMT

Perfect - thank you for the detailed response. It's much appreciated.