Splunk Enterprise

Why is splunk rex backreference not working as expected?

inventsekar
Super Champion

Hi All..

was struggling to find out the backreference of regular expressions, but not working as expected. 

lets say i want to find out if a test log has twin numbers (11, 22, 44, 55, etc)

 

 

 

| makeresults 
|eval log="test log... twin digit matching.. 123 11 5 $ % & * 123 4 ewrewrewe"
| rex field=log "\s(?P<twin>(\d)\1)\s"
| table log twin

 

 

 

 

i used the \1 to refer the backreference, but its not working.. as suggested on other post, i used \g{1}, but no luck. checked the mode=sed, but no luck. any ideas suggestions please. 

Labels (1)
Tags (1)
0 Karma
1 Solution

acharlieh
Influencer

Capture groups are numbered to be backreferences, based on the order of their opening parenthesis... all capturing groups, so in your example`(?P<twin>` starts the group that corresponds to \1 

To make things a bit clearer... let's name your other capturing groups in your example as well:

(?P<twin>(?P<teddy>\d)(?P<bear>\d)\2\2)

With this:

\1 would be the value captured by twin
\2 would be the value captured by teddy
\3 would be the value captured by bear

So your given rex would match 1211, 1311, 1411, 1511 ... and 1111

If you're wanting to match 1122 then you may want to start with something like: 

(?P<twin>(\d)\2(\d)\3)

 (which of course also matches 1111 ) ... but I'd recommend spending some time with https://regex101.com/ and other sites for helping with learning and experimenting with regular expressions.

View solution in original post

inventsekar
Super Champion

Hi @yeahnah ... yep, the \2 worked fine..

but how..  the linux sed command's backreference is \1, \2, to \9, right.. 

lets say i want to match for 1122, then.. i tried this one, but no luck

| makeresults 
|eval log="test log... twin digit matching.. 123 11221 5 $ % & * 123 4 ewrewrewe"
| rex field=log "(?P<twin>(\d)(\d)\2\2)"
| table log twin

 

0 Karma

acharlieh
Influencer

Capture groups are numbered to be backreferences, based on the order of their opening parenthesis... all capturing groups, so in your example`(?P<twin>` starts the group that corresponds to \1 

To make things a bit clearer... let's name your other capturing groups in your example as well:

(?P<twin>(?P<teddy>\d)(?P<bear>\d)\2\2)

With this:

\1 would be the value captured by twin
\2 would be the value captured by teddy
\3 would be the value captured by bear

So your given rex would match 1211, 1311, 1411, 1511 ... and 1111

If you're wanting to match 1122 then you may want to start with something like: 

(?P<twin>(\d)\2(\d)\3)

 (which of course also matches 1111 ) ... but I'd recommend spending some time with https://regex101.com/ and other sites for helping with learning and experimenting with regular expressions.

inventsekar
Super Champion

thanks @acharlieh .. very detailed ! 

 

one more question.. is this splunk rex's backreference, is it same as linux sed command's backreference?

or

is there any difference between -  splunk rex's backreference vs linux sed command's backreference

 

for reference, for linux sed command's backreference:

https://www.gnu.org/software/sed/manual/html_node/Back_002dreferences-and-Subexpressions.html

 

0 Karma

acharlieh
Influencer

The general concept of a backreference is the same between both.

Splunk's rex/regex processing in ingestion and during a search is powered by the Perl Compatible Regular Expressions library.

There are syntactic and execution differences between PCRE & GNU SED's regular expressions, but other forums and sites would be appropriate for detailing out those exact differences.

I think I have also heard that there may be some (recently released / soon coming) products/features that may leverage the RE2 regex library instead (partly from being Golang based and also some more control on predictability in time/complexity bounds of execution). Of course RE2 has it's own set of nuanced differences from PCRE & GNU SED, but general concepts are similar, and it too can be tested in regex101.com (pick the Golang flavor instead of the default PCRE)

yeahnah
Motivator

Thanks @acharlieh 

I was about to try and explain it too, but you've done a far better job than me already.  And yes, regex101 is a great site for learning/practising regex.

0 Karma

yeahnah
Motivator

Hi @inventsekar 

The digit match is actually the second capture group in the rex - the first being the twin field.  Try this... 

 

 

| makeresults
|eval log="test log... twin digit matching.. 123 11 5 $ % & * 123 4 ewrewrewe"
| rex field=log "\s(?P<twin>(\d)\2)\s"
| table log twin

 

 

 Hope it helps

inventsekar
Super Champion

Hi @yeahnah ...though you gave the answer first, the other user gave the detailed explaination.. 

i have given upvote for your post. hope you understand my view, thanks. 

0 Karma
Get Updates on the Splunk Community!

Build Scalable Security While Moving to Cloud - Guide From Clayton Homes

 Clayton Homes faced the increased challenge of strengthening their security posture as they went through ...

Mission Control | Explore the latest release of Splunk Mission Control (2.3)

We’re happy to announce the release of Mission Control 2.3 which includes several new and exciting features ...

Cloud Platform | Migrating your Splunk Cloud deployment to Python 3.7

Python 2.7, the last release of Python 2, reached End of Life back on January 1, 2020. As part of our larger ...