Splunk Search

Regex in query

jacqu3sy
Path Finder

Hi,

Can anyone help with a regex to extract into a new field anything contained within raw data after a #?

For example, the following data from twitter;

opIcarus #opBlackOctober #opSacred #opMASSHACK ENGAGED https://t.co/JiWVA4kOXr

I'd like a way to extract and list all the content after each hash.

Thanks.

0 Karma
1 Solution

elliotproebstel
Champion

I think this should do what you're looking for:
rex max_match=0 field=_raw \#(?<extracted_field>[^\# ]*)
The keys here are the max_match argument, which tells rex to not stop at the first match, and also a slight modification to the regex that @cusello suggested (by adding a space to the ignored characters). Without that modification, I believe you will get erroneous matches.

Note that the max_match argument defaults to the value of 1. Setting it to 0 makes it unlimited, but you could set it to some other specific value if you only wanted to match a certain number of instances. Here is some info in the docs:
https://docs.splunk.com/Documentation/SplunkCloud/6.6.1/SearchReference/Rex

View solution in original post

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi jacqu3sy,
try

| rex max_match=0 "\#(?<my_field>[^\#]*)"

Bye.
Giuseppe

0 Karma

elliotproebstel
Champion

I think this should do what you're looking for:
rex max_match=0 field=_raw \#(?<extracted_field>[^\# ]*)
The keys here are the max_match argument, which tells rex to not stop at the first match, and also a slight modification to the regex that @cusello suggested (by adding a space to the ignored characters). Without that modification, I believe you will get erroneous matches.

Note that the max_match argument defaults to the value of 1. Setting it to 0 makes it unlimited, but you could set it to some other specific value if you only wanted to match a certain number of instances. Here is some info in the docs:
https://docs.splunk.com/Documentation/SplunkCloud/6.6.1/SearchReference/Rex

0 Karma

jacqu3sy
Path Finder

Perfect - thank you for the detailed response. It's much appreciated.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi jacqu3sy,
try this regex

\#(?<my_field>[^\#]*)

you can test it at https://regex101.com/r/Sk52x3/1

Bye.
Giuseppe

0 Karma

jacqu3sy
Path Finder

Not quite. It pulls out the first #hashtag within the _raw field, but ignores the others. So in the example above, it would extract #opBlackOctober, but ignore the others.

Is there a way of extracting all? Thanks.

0 Karma
Get Updates on the Splunk Community!

Observe and Secure All Apps with Splunk

  Join Us for Our Next Tech Talk: Observe and Secure All Apps with SplunkAs organizations continue to innovate ...

Splunk Decoded: Business Transactions vs Business IQ

It’s the morning of Black Friday, and your e-commerce site is handling 10x normal traffic. Orders are flowing, ...

Fastest way to demo Observability

I’ve been having a lot of fun learning about Kubernetes and Observability. I set myself an interesting ...