Splunk Search

Extracting multiple values from a multivalue field: using rex vs. props.conf or transforms.conf

arkadyz1
Builder

This is a follow-up to my previous question.

In there, I managed to extract a multivalue index-time field, but could not use that one to extract another one from it. Right now I'm planning a workaround. I already have a multivalue mainKey, but want to extract a subKey from it, and do it not on search line, if possible, but in the props/transforms.

Here is the search string, that I'm using right now:

index = "testIndexTimeFields" sourcetype = "testIndexFields" | rex field=mainKey "^[a-zA-Z]*(?P<subKey>\d+)$"

And it does exactly what I expect: for those events where mainKey is multivalue, the corresponding values of subKey are extracted from each individual mainKey wherever possible, creating multivalue subKey when there is more than one parseable mainKey value.

Is there any good way to mimic that search time extraction in props.conf/transforms.conf? So far I tried some REPORTS to no avail...

The documentation is extremely vague on, for example, SOURCE_KEY value, but I did try both SOURCE_KEY=mainKey and SOURCE_KEY=field:mainKey with no success. Any ideas?

I check the functionality by adding | table mainKey, subKey to that search string and looking at the results. My objective would be to remove that | rex ... part, slap on the same | table... and still get the same results, thanks to the props and transforms performing that extraction for me.

0 Karma
1 Solution

arkadyz1
Builder

Update: I found the right way after realizing the error in my testing.

So here it goes:
The transforms.conf stanza for creating that subKey now looks like this:

[subKey]
REGEX = ^[a-zA-Z]*(?P<subKey>\d+)$
#FORMAT = subKey::$1
SOURCE_KEY = mainKey
MV_ADD = true

FORMAT is commented out because I have a named field extraction in the REGEX itself, but it's there as a reminder.
Moreover, I found that if I need more than one subKey extraction (for example, if there are different formats of mainKey requiring different results), I can add another stanza and reference it from props.conf - even in the same REPORT property. For example, I also added the following extraction, which, when found a mainKey value starting with "!" (exclamation sign), strips it and saves the rest as a subKey (just to serve as an example):

[subKey1]
REGEX = (?m-s)^!(?P<subKey>.*)$
SOURCE_KEY = mainKey
MV_ADD = true

and referenced both from this props.conf property: REPORT-subKey = subKey, subKey1. I needed that (?m-s) at the beginning of the second REGEX because otherwise the .* would consume all the subsequent mainKey values. $ matches both the end of the whole set of multiple values (which seems to be treated internally as multi-line) and the ends of each individual value with or without that flag, as evidenced by my first stanza.

When there are multiple extracting stanzas matching one or more values in mainKey, the first one extracts its (potentially multitude of) values, then all those extracted by the second one are added, and so on. This might mess up the order of the mainKey and the corresponding extracted subKey values, but it's ok in my case.

View solution in original post

0 Karma

arkadyz1
Builder

Update: I found the right way after realizing the error in my testing.

So here it goes:
The transforms.conf stanza for creating that subKey now looks like this:

[subKey]
REGEX = ^[a-zA-Z]*(?P<subKey>\d+)$
#FORMAT = subKey::$1
SOURCE_KEY = mainKey
MV_ADD = true

FORMAT is commented out because I have a named field extraction in the REGEX itself, but it's there as a reminder.
Moreover, I found that if I need more than one subKey extraction (for example, if there are different formats of mainKey requiring different results), I can add another stanza and reference it from props.conf - even in the same REPORT property. For example, I also added the following extraction, which, when found a mainKey value starting with "!" (exclamation sign), strips it and saves the rest as a subKey (just to serve as an example):

[subKey1]
REGEX = (?m-s)^!(?P<subKey>.*)$
SOURCE_KEY = mainKey
MV_ADD = true

and referenced both from this props.conf property: REPORT-subKey = subKey, subKey1. I needed that (?m-s) at the beginning of the second REGEX because otherwise the .* would consume all the subsequent mainKey values. $ matches both the end of the whole set of multiple values (which seems to be treated internally as multi-line) and the ends of each individual value with or without that flag, as evidenced by my first stanza.

When there are multiple extracting stanzas matching one or more values in mainKey, the first one extracts its (potentially multitude of) values, then all those extracted by the second one are added, and so on. This might mess up the order of the mainKey and the corresponding extracted subKey values, but it's ok in my case.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...