Splunk Enterprise

Why is splunk rex backreference not working as expected?

inventsekar
SplunkTrust
SplunkTrust

Hi All..

was struggling to find out the backreference of regular expressions, but not working as expected. 

lets say i want to find out if a test log has twin numbers (11, 22, 44, 55, etc)

 

 

 

| makeresults 
|eval log="test log... twin digit matching.. 123 11 5 $ % & * 123 4 ewrewrewe"
| rex field=log "\s(?P<twin>(\d)\1)\s"
| table log twin

 

 

 

 

i used the \1 to refer the backreference, but its not working.. as suggested on other post, i used \g{1}, but no luck. checked the mode=sed, but no luck. any ideas suggestions please. 

thanks and best regards,
Sekar

PS - If this or any post helped you in any way, pls consider upvoting, thanks for reading !
Labels (1)
Tags (1)
0 Karma
1 Solution

acharlieh
Influencer

Capture groups are numbered to be backreferences, based on the order of their opening parenthesis... all capturing groups, so in your example`(?P<twin>` starts the group that corresponds to \1 

To make things a bit clearer... let's name your other capturing groups in your example as well:

(?P<twin>(?P<teddy>\d)(?P<bear>\d)\2\2)

With this:

\1 would be the value captured by twin
\2 would be the value captured by teddy
\3 would be the value captured by bear

So your given rex would match 1211, 1311, 1411, 1511 ... and 1111

If you're wanting to match 1122 then you may want to start with something like: 

(?P<twin>(\d)\2(\d)\3)

 (which of course also matches 1111 ) ... but I'd recommend spending some time with https://regex101.com/ and other sites for helping with learning and experimenting with regular expressions.

View solution in original post

inventsekar
SplunkTrust
SplunkTrust

Hi @yeahnah ... yep, the \2 worked fine..

but how..  the linux sed command's backreference is \1, \2, to \9, right.. 

lets say i want to match for 1122, then.. i tried this one, but no luck

| makeresults 
|eval log="test log... twin digit matching.. 123 11221 5 $ % & * 123 4 ewrewrewe"
| rex field=log "(?P<twin>(\d)(\d)\2\2)"
| table log twin

 

thanks and best regards,
Sekar

PS - If this or any post helped you in any way, pls consider upvoting, thanks for reading !
0 Karma

acharlieh
Influencer

Capture groups are numbered to be backreferences, based on the order of their opening parenthesis... all capturing groups, so in your example`(?P<twin>` starts the group that corresponds to \1 

To make things a bit clearer... let's name your other capturing groups in your example as well:

(?P<twin>(?P<teddy>\d)(?P<bear>\d)\2\2)

With this:

\1 would be the value captured by twin
\2 would be the value captured by teddy
\3 would be the value captured by bear

So your given rex would match 1211, 1311, 1411, 1511 ... and 1111

If you're wanting to match 1122 then you may want to start with something like: 

(?P<twin>(\d)\2(\d)\3)

 (which of course also matches 1111 ) ... but I'd recommend spending some time with https://regex101.com/ and other sites for helping with learning and experimenting with regular expressions.

inventsekar
SplunkTrust
SplunkTrust

thanks @acharlieh .. very detailed ! 

 

one more question.. is this splunk rex's backreference, is it same as linux sed command's backreference?

or

is there any difference between -  splunk rex's backreference vs linux sed command's backreference

 

for reference, for linux sed command's backreference:

https://www.gnu.org/software/sed/manual/html_node/Back_002dreferences-and-Subexpressions.html

 

thanks and best regards,
Sekar

PS - If this or any post helped you in any way, pls consider upvoting, thanks for reading !
0 Karma

acharlieh
Influencer

The general concept of a backreference is the same between both.

Splunk's rex/regex processing in ingestion and during a search is powered by the Perl Compatible Regular Expressions library.

There are syntactic and execution differences between PCRE & GNU SED's regular expressions, but other forums and sites would be appropriate for detailing out those exact differences.

I think I have also heard that there may be some (recently released / soon coming) products/features that may leverage the RE2 regex library instead (partly from being Golang based and also some more control on predictability in time/complexity bounds of execution). Of course RE2 has it's own set of nuanced differences from PCRE & GNU SED, but general concepts are similar, and it too can be tested in regex101.com (pick the Golang flavor instead of the default PCRE)

yeahnah
Motivator

Thanks @acharlieh 

I was about to try and explain it too, but you've done a far better job than me already.  And yes, regex101 is a great site for learning/practising regex.

0 Karma

yeahnah
Motivator

Hi @inventsekar 

The digit match is actually the second capture group in the rex - the first being the twin field.  Try this... 

 

 

| makeresults
|eval log="test log... twin digit matching.. 123 11 5 $ % & * 123 4 ewrewrewe"
| rex field=log "\s(?P<twin>(\d)\2)\s"
| table log twin

 

 

 Hope it helps

inventsekar
SplunkTrust
SplunkTrust

Hi @yeahnah ...though you gave the answer first, the other user gave the detailed explaination.. 

i have given upvote for your post. hope you understand my view, thanks. 

thanks and best regards,
Sekar

PS - If this or any post helped you in any way, pls consider upvoting, thanks for reading !
0 Karma
Get Updates on the Splunk Community!

What's New in Splunk Enterprise 9.4: Features to Power Your Digital Resilience

Hey Splunky People! We are excited to share the latest updates in Splunk Enterprise 9.4. In this release we ...

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

SignalFlow: What? Why? How?

What is SignalFlow? Splunk Observability Cloud’s analytics engine, SignalFlow, opens up a world of in-depth ...