Getting Data In

Order of execution / precedence of multiple TRANSFORMS

PavelP
Motivator

consider:

Log:

2020-04-01 10:20:30 firstabc secondxyz

props.conf

[test]
REPORT-a = report_a, report_b

transforms.conf

[report_a]
REGEX=first(?<a>\w+)

[report_b]
REGEX=second(?<a>\w+)

Question 1: what is value of the field "a"?

Question 2: will the results be the same with this props.conf:

[test]
REPORT-a = report_a
REPORT-b = report_b

Challenge: try guessing without testing first 🙂

I'll spare you a search - here is a link for a previous discussion with two different opinions: https://answers.splunk.com/answers/320868/what-is-the-order-of-execution-precedence-of-multi.html

Question 3: do you get the expected results?

This post is not a 1 April joke 🙂

Edit 02.04.2020: it is actually the second statement "fields are not overridden so once an earlier-executed transform has given a field a value, later-executed ones will not update/overwrite that original value" confirmed with this test case. Otherwise the "a" field would have the "xyz" value.

I was previously ready to bet that "later-executed ones can update/overwrite that original value" but as you see it is not the case.

The purpose of this post is to ask community and help to clarify. May be somebody has a link where this behaviour is documented.

1 Solution

anmolpatel
Builder

@PavelP
In your example:

[test]
        
REPORT-a = report_a, report_b

and

[test]
REPORT-a = report_a
REPORT-b = report_b

when Splunk executes the search time extraction, it will execute the stanzas report_a and report_b. The second version is essentially the same from the transforms. Splitting up REPORT-a into REPORT-a and REPORT-b does not change the outcome as the transforms uses the same KEY ‘a’ in the REGEX.
When the transforms for report-a returns the KEY ‘a’ there is no value assigned, so it sets the initial extraction (abc), then report-b is ran and the REGEX identified that KEY 'a' already has a value, so the value is discarded. 
@to4kawa pointed in the right direction. The latter extracted value by report_b is discarded because of MV_ADD default value is false.

That is documented here: https://docs.splunk.com/Documentation/Splunk/latest/Admin/Transformsconf#GLOBAL_SETTINGS

MV_ADD = [true|false]
* NOTE: This setting is only valid for search-time field extractions.
* Optional. Controls what the extractor does when it finds a field which
   already exists.
* If set to true, the extractor makes the field a multivalued field and
   appends the newly found value, otherwise the newly found value is
   discarded.
* Default: false

You can use alternating stanza though the KEY needs to be different, if the KEY needs to be the same, use MV_ADD to create a multi valued field.

props.conf
[test]
REPORT-a = report_a, report_b

transforms.conf
[report_a]
REGEX=first(?\w+)

[report_b]
REGEX=second(?\w+)

This will return the extraction as --
a = abc
b = xyz

There is this excellent document which talks about the Lexicographical ordering, though not applicable for this scenario, will provide a good insight on how to name the props / transforms.
https://docs.splunk.com/Documentation/SplunkCloud/8.0.2003/Knowledge/Searchtimeoperationssequence#Le...
Hope this clarifies.

View solution in original post

0 Karma

PavelP
Motivator

thank you @anmolpatel and @to4kawa , I think you've helped me to understand this topic. The answer is fully and clearly documented with few words in the spec file:

MV_ADD = [true|false]
* NOTE: This setting is only valid for search-time field extractions.
* Optional. Controls what the extractor does when it finds a field which
  already exists.
* If set to true, the extractor makes the field a multivalued field and
  appends the newly found value, otherwise the newly found value is
  discarded.
* Default: false

I'll emphasize it for myself: If set to true, the extractor makes the field a multivalued field and appends the newly found value, otherwise the newly found value is discarded.

It is a trap for somebody like me, coming from the programming background, where a value always can be overwritten. The splunk logic differs from the programming logic. In splunk universum, using transform, there are only two options for handling a situation 'what the extractor does when it finds a field which already exists':

  1. discard it
  2. create MV field if MV_ADD=true
  3. there is no option #3 !

And here is the point where REPORT is different from the EXTRACT - using extract you can overwrite fields:

|makeresults | eval _raw="firstabc1 secondxyz1" | rex "first(?<a>\w+)" | rex "second(?<a>\w+)"

I thank you guys, I'll accept both answers!

anmolpatel
Builder

@PavelP
In your example:

[test]
        
REPORT-a = report_a, report_b

and

[test]
REPORT-a = report_a
REPORT-b = report_b

when Splunk executes the search time extraction, it will execute the stanzas report_a and report_b. The second version is essentially the same from the transforms. Splitting up REPORT-a into REPORT-a and REPORT-b does not change the outcome as the transforms uses the same KEY ‘a’ in the REGEX.
When the transforms for report-a returns the KEY ‘a’ there is no value assigned, so it sets the initial extraction (abc), then report-b is ran and the REGEX identified that KEY 'a' already has a value, so the value is discarded. 
@to4kawa pointed in the right direction. The latter extracted value by report_b is discarded because of MV_ADD default value is false.

That is documented here: https://docs.splunk.com/Documentation/Splunk/latest/Admin/Transformsconf#GLOBAL_SETTINGS

MV_ADD = [true|false]
* NOTE: This setting is only valid for search-time field extractions.
* Optional. Controls what the extractor does when it finds a field which
   already exists.
* If set to true, the extractor makes the field a multivalued field and
   appends the newly found value, otherwise the newly found value is
   discarded.
* Default: false

You can use alternating stanza though the KEY needs to be different, if the KEY needs to be the same, use MV_ADD to create a multi valued field.

props.conf
[test]
REPORT-a = report_a, report_b

transforms.conf
[report_a]
REGEX=first(?\w+)

[report_b]
REGEX=second(?\w+)

This will return the extraction as --
a = abc
b = xyz

There is this excellent document which talks about the Lexicographical ordering, though not applicable for this scenario, will provide a good insight on how to name the props / transforms.
https://docs.splunk.com/Documentation/SplunkCloud/8.0.2003/Knowledge/Searchtimeoperationssequence#Le...
Hope this clarifies.

0 Karma

PavelP
Motivator

Hello @anmolpatel , thank you for your time and help!
You wrote: In your second version, you're assigning the extracted value to a new Key b, so Splunk does not discard the value. The new Key doesn't has not been assigned a value, thus setting it to the extracted value.
Actually both versions extract a field "a", there is not such field "b" in the examples, the only difference is how the stanzas in transforms.conf is called.
I get the same results in both cases - the later executed value does not override an existing value, in another words "fields are not overridden so once an earlier-executed transform has given a field a value, later-executed ones will not update/overwrite that original value".

My conclusion so far is that extractions with transforms using alternative stanzas should be avoided because only the first matched transform will be applied.

I'm going to check if any of existing Apps/Addons have such kind of configuration to understand if using of alternativ transforms is a "bad practice" or can have application.

0 Karma

anmolpatel
Builder

@PavelP i misread the transforms so did the extraction incorrectly. I've updated my answer to the answer the original query.

It is not that the first matched transforms is applied, the KEY value is assigned by the first extraction, so Splunk does not override the value it finds the the second REGEX. With using the same KEY in the REGEX, you can create a multi valued field and not override the extraction.

If you swap the stanza, you will get. a different result:
[test]
REPORT-a = report_b, report_a

0 Karma

to4kawa
Ultra Champion

a1: abc, need MV_ADD = true
a2: same
a3: on transforms.conf MV_ADD = true

PavelP
Motivator

thank you for trying!

In this case the second statement is correct:

so once an earlier-executed transform has given a field a value, later-executed ones can update/overwrite that original value.

OR

"Although the execution does not "stop early" fields are not overridden so once an earlier-executed transform has given a field a value, later-executed ones will not update/overwrite that original value."

I'm wondering if it is documented somewhere 🙂

to4kawa
Ultra Champion

later-executed ones can update/overwrite that original value.
wow, I did not know that.
Thank you

0 Karma

PavelP
Motivator

it is actually the second statement "fields are not overridden so once an earlier-executed transform has given a field a value, later-executed ones will not update/overwrite that original value" confirmed with this test case. Otherwise the "a" field would have the "xyz" value.

I was previously ready to bet that "later-executed ones can update/overwrite that original value" but as you see it is not the case.

the purpose of this post is to ask community and help to clarify. May be somebody has a link where it is documented.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...