I have written a Regex to perform an extraction in transforms.conf that I've tested in multiple PCRE compliant regular expression editors and it works perfectly, but the results in Splunk after it is run through a transform are different. Here are the details....
I have a field named RECORD_NAMES with a value of...
Zx1%2CEOS 5D Mark II%252C body%2CHR10 & Tenba Xpress: Medium Pouch%252C Black/Teal%2CDigital IXUS 85 IS%2CZ980%2CFUN Flash Single Use Camera%252C 1+1 Pack%2CPrima Super 130U Date%2CEOS 5D Mark II + EF 24-105mm f4L IS USM%2CQuickCam® Chat for Skype%2CDigital IXUS 80 IS%2C15x50 IS%2CCyber-shot T70%252C Black
From this I need to create a new multi-value field by breaking down each individual value delimited by %2C
Here is my regex...
This regex extracts each value as expected when run in a regex editor but after the transform each of the extracted values (with exception of the first Zx1) is prefixed by %2C which as specified above is the delimiter and should not be there.
Here is how my transform to create a new field aaa is configured...
SOURCEKEY = RECORDNAMES
REGEX = (?<=%2C|^)(.+?)(?=%2C|$)
FORMAT = aaa::$1
MV_ADD = True
Here are the actual individual values of my new field aaa in Splunk after that transform...
%2CQuickCam® Chat for Skype
%2CPrima Super 130U Date
%2CHR10 & Tenba Xpress: Medium Pouch%252C Black/Teal
%2CFUN Flash Single Use Camera%252C 1+1 Pack
%2CEOS 5D Mark II%252C body
%2CEOS 5D Mark II + EF 24-105mm f4L IS USM
%2CDigital IXUS 85 IS
%2CDigital IXUS 80 IS
%2CCyber-shot T70%252C Black
The issue seems to be related to the start of line ^ thanks Ayn, because I removed that and it will match all values except the first without including %2C. No matter how I tried to reconfigure the regex though I could not find a way to resolve the issue, so instead I am performing two transforms to get around the issue. The first extracts the first value and the second extracts all remaining values.
I'm surprised that that works in a other pcre engines.
When I run that regex in perl I get this :
Variable length lookbehind not implemented in regex
This might be more suited :
Yeah, I'm aware of this though Perl and PCRE are not strictly identical. Regardless the regex can be changed to be
which will also work in Perl but still result in the same behavior from Splunk.