I wanted to see if anyone else had come across some strange behaviour when using the (?J) mode modifier in the 'rex' command.
This modifier should allow you to use the same capture group name more than once, in the same regular expression. If you try and do this without the modifier, you get the error:
Regex: two named subpatterns have the same name
In some 'rex' work that I'm doing, I'm using the Regular Expression conditionals syntax for 'If, then, else'.
The syntax for this is:
I'm using a number of these in a nested way to match some code in Cisco ACLs that has very poor (read awful) syntax structure.
(Anyway, that's another story).
The problem that I'm seeing in Splunk, is that if the same capture group name is in both the 'Then' and 'Else' parts, then it will only extract for the 'Else' case.
If it matches in the 'Then' part, it would appear that the field gets 'nulled' due to the second definition in the 'Else' part. This feels wrong, as if the 'Then' case is matched, the regex engine shouldn't be tracking through the 'Else' part.
You can test this behaviour in Splunk with the following test case:
| makeresults | eval case1="a then match" | eval case2="a else match" | rex field=case1 "(?J)a (?(?=then)(?<case1_match>then)|(?<case1_match>else)) match" | rex field=case2 "(?J)a (?(?=then)(?<case2_match>then)|(?<case2_match>else)) match" | table case*
In the resulting table, you should get:
case1 = "a then match"
case1_match = "then"
case2 = "a else match"
case2_match = "else"
What actually happens is that the field 'case1_match' is blank / null.
I've tried the expression in the online Regex101 site (unfortunately I can't post URLs yet, but copy/paste 'regex101.com/r/lX2uY8/2').
Has anyone else come across this issue before?
Is it by design or is it a bug?
Im sure that there are other ways for me to tackle what I'm looking at (I'm not too worried about that). What I just want to know if if this is functionality that 'should' work in Splunk.
This is in version 6.4.2 of Splunk Enterprise.
Mild tangent: Have you considered this alternative to
| rex field=case1 "a (?<case1_match>(?(?=then)then|else)) match" | rex field=case2 "a (?<case2_match>(?(?=then)then|else)) match"
Thanks for the feedback. Yes, your example would certainly work, as would the very simple form:
| rex field=case1 "a (?<case1_match>then|else) match" | rex field=case2 "a (?<case2_match>then|else) match"
without any need for the conditional.
The example in the original post was just to demonstrate the issue that I'm seeing with the duplicate subpattern mode.
I'm more curious around seeing if others are trying to use the (?J) mode and what they're thoughts are.
The great think about Splunk and Regex, is that's there's always going to be lots of ways to get to the answer!
It seems like two same named group in rex is confusing Splunk field extraction.
This worked in my test.
| makeresults | eval case1="a then match" | eval case2="a else match" | rex field=case1 "a (?<case1_match>(?(?=then)then|else)) match" | rex field=case2 "a (?<case2_match>(?(?=then)then|else)) match" | table case*
If we consider regex for scalability and cost of system resource usage, I would avoid using conditional and/or lookahead/lookbehind as much as possible. But, in this specific question to check functionality, it is a good question 🙂
I worked around the problem by creating multiple transform stanzas and prioritizing them in the report stanza.
REGEX = a (?then) match
REGEX = a (?else) match
REPORT-thenelse = thenmatch,elsematch
Thanks for the suggestion. Another interesting way to solve the example.
What I'm actually doing is using conditionals to do lots of branching. Think if it like using nested IFs in Excel. You end up with something like this
There's lots of other ways to achieve what I looking at; I'm more curious about the (?J) behaviour and if others are using it.
It looks like this may be a 'PCRE' thing as opposed to anything to do with Splunk.
The site here (http://www.regular-expressions.info/branchreset.html) suggests that:
In Perl and PCRE, it is best to use a
branch reset group when you want
groups in different alternatives to
have the same name. That's the only
way in Perl and PCRE to make sure that
groups with the same name really are
one and the same group.
So dropping the (?J) mode and the conditional, I can still use duplicate subpatterns within a Branch Reset group:
| rex field=case1 "a (?|(?<case1b_match>then)|(?<case1b_match>else)) match" | rex field=case2 "a (?|(?<case2b_match>then)|(?<case2b_match>else)) match"
Not quite what I'm looking for, as I'd still like to use the conditional form, but it's an interesting one nevertheless.
Every day's a school-day when it comes to Regular Expressions!