Hi All..
was struggling to find out the backreference of regular expressions, but not working as expected.
lets say i want to find out if a test log has twin numbers (11, 22, 44, 55, etc)
| makeresults
|eval log="test log... twin digit matching.. 123 11 5 $ % & * 123 4 ewrewrewe"
| rex field=log "\s(?P<twin>(\d)\1)\s"
| table log twin
i used the \1 to refer the backreference, but its not working.. as suggested on other post, i used \g{1}, but no luck. checked the mode=sed, but no luck. any ideas suggestions please.
Capture groups are numbered to be backreferences, based on the order of their opening parenthesis... all capturing groups, so in your example`(?P<twin>` starts the group that corresponds to \1
To make things a bit clearer... let's name your other capturing groups in your example as well:
(?P<twin>(?P<teddy>\d)(?P<bear>\d)\2\2)
With this:
\1 would be the value captured by twin
\2 would be the value captured by teddy
\3 would be the value captured by bear
So your given rex would match 1211, 1311, 1411, 1511 ... and 1111
If you're wanting to match 1122 then you may want to start with something like:
(?P<twin>(\d)\2(\d)\3)
(which of course also matches 1111 ) ... but I'd recommend spending some time with https://regex101.com/ and other sites for helping with learning and experimenting with regular expressions.
Hi @yeahnah ... yep, the \2 worked fine..
but how.. the linux sed command's backreference is \1, \2, to \9, right..
lets say i want to match for 1122, then.. i tried this one, but no luck
| makeresults
|eval log="test log... twin digit matching.. 123 11221 5 $ % & * 123 4 ewrewrewe"
| rex field=log "(?P<twin>(\d)(\d)\2\2)"
| table log twin
Capture groups are numbered to be backreferences, based on the order of their opening parenthesis... all capturing groups, so in your example`(?P<twin>` starts the group that corresponds to \1
To make things a bit clearer... let's name your other capturing groups in your example as well:
(?P<twin>(?P<teddy>\d)(?P<bear>\d)\2\2)
With this:
\1 would be the value captured by twin
\2 would be the value captured by teddy
\3 would be the value captured by bear
So your given rex would match 1211, 1311, 1411, 1511 ... and 1111
If you're wanting to match 1122 then you may want to start with something like:
(?P<twin>(\d)\2(\d)\3)
(which of course also matches 1111 ) ... but I'd recommend spending some time with https://regex101.com/ and other sites for helping with learning and experimenting with regular expressions.
thanks @acharlieh .. very detailed !
one more question.. is this splunk rex's backreference, is it same as linux sed command's backreference?
or
is there any difference between - splunk rex's backreference vs linux sed command's backreference
for reference, for linux sed command's backreference:
https://www.gnu.org/software/sed/manual/html_node/Back_002dreferences-and-Subexpressions.html
The general concept of a backreference is the same between both.
Splunk's rex/regex processing in ingestion and during a search is powered by the Perl Compatible Regular Expressions library.
There are syntactic and execution differences between PCRE & GNU SED's regular expressions, but other forums and sites would be appropriate for detailing out those exact differences.
I think I have also heard that there may be some (recently released / soon coming) products/features that may leverage the RE2 regex library instead (partly from being Golang based and also some more control on predictability in time/complexity bounds of execution). Of course RE2 has it's own set of nuanced differences from PCRE & GNU SED, but general concepts are similar, and it too can be tested in regex101.com (pick the Golang flavor instead of the default PCRE)
Thanks @acharlieh
I was about to try and explain it too, but you've done a far better job than me already. And yes, regex101 is a great site for learning/practising regex.
Hi @inventsekar
The digit match is actually the second capture group in the rex - the first being the twin field. Try this...
| makeresults
|eval log="test log... twin digit matching.. 123 11 5 $ % & * 123 4 ewrewrewe"
| rex field=log "\s(?P<twin>(\d)\2)\s"
| table log twin
Hope it helps
Hi @yeahnah ...though you gave the answer first, the other user gave the detailed explaination..
i have given upvote for your post. hope you understand my view, thanks.