Splunk Search

Regular expression used in transform not performing as expected

Splunk Employee
Splunk Employee

I have written a Regex to perform an extraction in transforms.conf that I've tested in multiple PCRE compliant regular expression editors and it works perfectly, but the results in Splunk after it is run through a transform are different. Here are the details....

I have a field named RECORD_NAMES with a value of...

Zx1%2CEOS 5D Mark II%252C body%2CHR10 & Tenba Xpress: Medium Pouch%252C Black/Teal%2CDigital IXUS 85 IS%2CZ980%2CFUN Flash Single Use Camera%252C 1+1 Pack%2CPrima Super 130U Date%2CEOS 5D Mark II + EF 24-105mm f4L IS USM%2CQuickCam® Chat for Skype%2CDigital IXUS 80 IS%2C15x50 IS%2CCyber-shot T70%252C Black

From this I need to create a new multi-value field by breaking down each individual value delimited by %2C

Here is my regex...

(?<=%2C|^)(.+?)(?=%2C|$)

This regex extracts each value as expected when run in a regex editor but after the transform each of the extracted values (with exception of the first Zx1) is prefixed by %2C which as specified above is the delimiter and should not be there.

Here is how my transform to create a new field aaa is configured...

[logserveroutput-RecordName]

SOURCEKEY = RECORDNAMES

REGEX = (?<=%2C|^)(.+?)(?=%2C|$)

FORMAT = aaa::$1

MV_ADD = True

Here are the actual individual values of my new field aaa in Splunk after that transform...

Zx1

%2CZ980

%2CQuickCam® Chat for Skype

%2CPrima Super 130U Date

%2CHR10 & Tenba Xpress: Medium Pouch%252C Black/Teal

%2CFUN Flash Single Use Camera%252C 1+1 Pack

%2CEOS 5D Mark II%252C body

%2CEOS 5D Mark II + EF 24-105mm f4L IS USM

%2CDigital IXUS 85 IS

%2CDigital IXUS 80 IS
%2CCyber-shot T70%252C Black
%2C15x50 IS

Tags (1)
0 Karma

Splunk Employee
Splunk Employee

The issue seems to be related to the start of line ^ thanks Ayn, because I removed that and it will match all values except the first without including %2C. No matter how I tried to reconfigure the regex though I could not find a way to resolve the issue, so instead I am performing two transforms to get around the issue. The first extracts the first value and the second extracts all remaining values.

0 Karma

Influencer

I'm surprised that that works in a other pcre engines.

When I run that regex in perl I get this :

Variable length lookbehind not implemented in regex

This might be more suited :

(?:%2C|^)(.+?)(?=%2C|$)

0 Karma

Splunk Employee
Splunk Employee

Thanks Ayn, the ^ does seem to be the cause of the issue though I was unable to resolve it directly it did help me find a workaround (see comment on my original question).

0 Karma

Legend

You're looking for %2C OR the start of a line. Are you entirely sure that the regex will prefer to match %2C(data) instead of ^(%2Cdata)?

Splunk Employee
Splunk Employee

Yeah, I'm aware of this though Perl and PCRE are not strictly identical. Regardless the regex can be changed to be

(?:^|(?<=%2C))(.+?)(?=%2C|$)

which will also work in Perl but still result in the same behavior from Splunk.

0 Karma