Splunk Search

Why does my rex search with a regex only match a few of the results even with max_match?

roayers
Explorer

i have an odd issue that i cant seem to get beyond it might be as simple as a regex change but I can seem to figure it out.

here is the query what it manages to extract to a field called frequency.

| rex field=_raw max_match=0 "[\.^\d^\,|\|]*(?[^\d]\d{2,3}\.\d{1,4}[^\,^\W]+)" | table sender body subject freq

Here is the the field called body

22111 ____________________ Message Body ____________________ Fog moved out fairly early after storms,got 3 T-6B from Whiting in here, TEXAN 163,166163,Black Bird 010 166010 and Texan 061 166061 all up 118.05,127.6 118.75 Final Controller for ASR approaches and Red Knight 164 166164 passed over head toward Nashville 120.8,128.15 John Doe Dallas,Texas 12 Miles Northwest KHSVPro-97,Pro-197,Pro-2055,Psr-400,Pro-2052,Pro-163,BCD-996XT(2),BCD996P2(2)

The rex command finds these

118.05
118.75
128.15

But should find these matches

118.05
127.6
118.75
120.8
128.15

Thanks for any help with the regex which I believe is the issue. I'm fairly new to regular expressions. The problem is that theses are from email message bodies and there is no consistency between delimiters. Some have a space, comma, pipe, semicolon,colon or a dash so I have to account for all of the possibilities.

0 Karma
1 Solution

somesoni2
SplunkTrust
SplunkTrust

Give this a try

| rex field=_raw max_match=0 "(,|\s)(?<freq>\d{2,3}\.\d{1,4})" | table sender body subject freq

View solution in original post

0 Karma

woodcock
Esteemed Legend

Try this:

| rex field=_raw max_match=0 "(?<=[\s,])(?<MyField>\d{2,3}\.\d{1,4})(?=[,\s])" | table sender body subject freq
0 Karma

horsefez
SplunkTrust
SplunkTrust

you can escape characters like < and > by writing them in this format &lt; and &gt;

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Give this a try

| rex field=_raw max_match=0 "(,|\s)(?<freq>\d{2,3}\.\d{1,4})" | table sender body subject freq
0 Karma

roayers
Explorer

Awesome!!!! That worked like it was supposed to. Now that you go that working, Here's on more what would I have to add to the regex to capture 15 characters before and after each ###.### then put them into a filed called pre and post? They would have to be extracted as they don't exist now.

0 Karma

roayers
Explorer

That would be any of the matches that were extracted with the regex

Like these

118.05
127.6
118.75
120.8
128.15

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Try to add this to the search

current search | rex field=_raw max_match=0 "(?<pre>\S{15})\d{2,3}\.\d{1,4})(?<post>\S{15})"
0 Karma

roayers
Explorer

get this error
"(?

\S{15})\d{2,3}.\d{1,4})(?\S{15}"': Regex: unmatched closing parenthesis 
if i add a "((?
\S{15})\d{2,3}.\d{1,4})(?\S{15})" to it it works but doesn't extract anything into any of the 3 fields pre freq post

0 Karma

horsefez
SplunkTrust
SplunkTrust

This regex-trick is in my opinion really hard to pull off, if it might not even be possible.
I tried to work on it by using the \G anchor, but had no success yet.

I believe only a true regex legend like duckfez could help here
@dwaddle

0 Karma

roayers
Explorer

Thanks for trying 🙂

0 Karma

somesoni2
SplunkTrust
SplunkTrust

What all things should show up for field pre and post (considering your example data in the ques)?

0 Karma

roayers
Explorer

that give me this error
\S{15})\d{2,3}.\d{1,4})(?\S{15}"': Regex: unmatched closing parenthesis

here is the whole search together

index=mail sourcetype=imap sender=comm | rex field=_raw max_match=0 "(?<=[\s,])(?\d{2,3}.\d{1,4})(?=[,\s])" | rex field=_raw max_match=0 "(?

\S{15})\d{2,3}.\d{1,4})(?\S{15})"|table sender recipient size body subject freq 

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Yup.. there was an extra bracket in the 2nd reg expression. This should run fine, but not sure if it's extracting correct values for pre and post field. Could you provide (if below query doesn't give you) expected output for pre/post?

index=mail sourcetype=imap sender=comm 
| rex field=_raw max_match=0 "(,|\s)(?<freq>\d{2,3}\.\d{1,4})" 
| rex field=_raw max_match=0 "(?<pre>\S{15})\d{2,3}\.\d{1,4}(?<post>\S{15})"
| table sender body subject pre freq post
0 Karma

roayers
Explorer

its working for the most part but there are still some issues run that against this body text

15968 ____________________ Message Body ____________________ Tom I have the following freqs for a/a for the NJ F16's - 140.100 140.200 140.700 Chris KBUF

0 Karma

roayers
Explorer

Thanks for all of your efforts

0 Karma

roayers
Explorer

The other option is to extract each of these into a separate field or search result and then run the pre and post search against each field. I'm open to any other possibilities.

0 Karma

horsefez
SplunkTrust
SplunkTrust

What do you mean by "###.###" ?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...