Splunk Search

Regex in query

jacqu3sy
Path Finder

Hi,

Can anyone help with a regex to extract into a new field anything contained within raw data after a #?

For example, the following data from twitter;

opIcarus #opBlackOctober #opSacred #opMASSHACK ENGAGED https://t.co/JiWVA4kOXr

I'd like a way to extract and list all the content after each hash.

Thanks.

0 Karma
1 Solution

elliotproebstel
Champion

I think this should do what you're looking for:
rex max_match=0 field=_raw \#(?<extracted_field>[^\# ]*)
The keys here are the max_match argument, which tells rex to not stop at the first match, and also a slight modification to the regex that @cusello suggested (by adding a space to the ignored characters). Without that modification, I believe you will get erroneous matches.

Note that the max_match argument defaults to the value of 1. Setting it to 0 makes it unlimited, but you could set it to some other specific value if you only wanted to match a certain number of instances. Here is some info in the docs:
https://docs.splunk.com/Documentation/SplunkCloud/6.6.1/SearchReference/Rex

View solution in original post

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi jacqu3sy,
try

| rex max_match=0 "\#(?<my_field>[^\#]*)"

Bye.
Giuseppe

0 Karma

elliotproebstel
Champion

I think this should do what you're looking for:
rex max_match=0 field=_raw \#(?<extracted_field>[^\# ]*)
The keys here are the max_match argument, which tells rex to not stop at the first match, and also a slight modification to the regex that @cusello suggested (by adding a space to the ignored characters). Without that modification, I believe you will get erroneous matches.

Note that the max_match argument defaults to the value of 1. Setting it to 0 makes it unlimited, but you could set it to some other specific value if you only wanted to match a certain number of instances. Here is some info in the docs:
https://docs.splunk.com/Documentation/SplunkCloud/6.6.1/SearchReference/Rex

0 Karma

jacqu3sy
Path Finder

Perfect - thank you for the detailed response. It's much appreciated.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi jacqu3sy,
try this regex

\#(?<my_field>[^\#]*)

you can test it at https://regex101.com/r/Sk52x3/1

Bye.
Giuseppe

0 Karma

jacqu3sy
Path Finder

Not quite. It pulls out the first #hashtag within the _raw field, but ignores the others. So in the example above, it would extract #opBlackOctober, but ignore the others.

Is there a way of extracting all? Thanks.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Tech Talk Recap | Mastering Threat Hunting

Mastering Threat HuntingDive into the world of threat hunting, exploring the key differences between ...

Observability for AI Applications: Troubleshooting Latency

If you’re working with proprietary company data, you’re probably going to have a locally hosted LLM or many ...

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

In the age of AI, every tool promises to make our lives easier. From summarizing content to writing code, ...