Knowledge Management

Search time extraction precedence

VatsalJagani
SplunkTrust
SplunkTrust

I want to get some ideas on search-time field extraction.

 

I already know that precedence when having host, source, and source type stanza.

I also know that search time precedence follows below order:

 

But I wanted to understand the below questions:

 

 

[host::test]
EXTRACT-a = <regex that extract field a="h1">
EVAL-b = "h2"

[source::test]
EXTRACT-a = <regex that extract field a="s1">
EVAL-b = "s2"
EVAL-c = "s3"
EXTRACT-e = <regex that extract field common="s4">

[test]
EXTRACT-a = <regex that extract field a="st1">
EVAL-b = "st2"
EXTRACT-c = <regex that extract field c="st3">
EXTRACT-d = <regex that extract field common="st4">

 

 

 

  1. What will be the final output fields in the above scenario?
  2. In what sequence the extraction (all the parameters) will be evaluated?
  3. Are there any parameters that will be skipped?
  4. When we say "X" applied before "Y" parameter, how do we know whether that will override the field value or keep the value from the first extraction parameter?
  5. Is there any document that can give an idea of these scenarios?
Labels (1)
1 Solution

VatsalJagani
SplunkTrust
SplunkTrust

Please find below props.conf which describe all the scenarios with the behavior of precedence and execution.

There will be two stages:

  1. precedence - stanzas will be evaluated, duplicate parameters will be removed, etc
  2. execution - Once the precedence stage is completed, stanzas do not matter anymore.
    1. Now, EXTRACT will apply first and then EVAL. If both are extracting the same value, EVAL will override the value extracted by EXTRACT.
    2. In two EXTRACT statements, the execution sequence will be based on the class name (alphabetically). If two EXTRACT statements evaluating the same field, second will not be overwriting value extracted by the first EXTRACT.

 

Sample props.conf which I used to test all of the above concepts.

 

# Sample Event Generator:
# | makeresults | eval _raw="a:1, b:2, c:3, d:4, e:5, f:6, g:7, h:8, i:9, j:10, k:11, l:12" | collect index=main source="my_source" sourcetype="my_sourcetype" host="my_host"


[source::my_source]
# Test-1: First test is simple, if there is two parameter in source and sourcetype, parameter from source will be applied
# Result - It will only apply EVAL-first from source stanza, in the final props lines, parameter from sourcetype will not be present.
# For the proof, check the EXTRACT statement, first2 will not be extracted even though regex in sourcetype stanza for first2 is correct. Because that will not be in the final list of parameters.
EVAL-first1 = "first from source"
EXTRACT-first2 = a::(?<first2>\d+)

# Test-2: There will be two separate part, first precedence and second execution
# In this example both extract field second1 and second2 with different values
# Result - It will first apply EXTRACT a, then b, then c, then d
# And in this case it will override the value extracted by previous EXTRACT parameter
EXTRACT-a = a:(?<second1>\d+)
EXTRACT-d = c:(?<second2>\d+)



[my_sourcetype]
# Test-1
EVAL-first1 = "first from sourcetype"
EXTRACT-first2 = b:(?<first2>\d+)

# Test-2
EXTRACT-b = b:(?<second1>\d+)
EXTRACT-c = d:(?<second2>\d+)

# Test-3: Whether value extracted by EXTRACT will be overwritten by EVAL or not?
# Result - EVAL will overwrite the value extracted by EVAL
EXTRACT-e = e:(?<third>\d+)
EVAL-third = "third from eval"

# Test-4: Whether value extracted by first EXTRACT will be overwritten by second EXTRACT or not?
# Result - No. In this case forth1 and forth2 both extract field forth1, but value will be assigned by forth1
# In second case, forth3 is wrong regex not extracting any value hence value assigned by forth4 will be kept for field forth2
EXTRACT-forth1 = a:(?<forth1>\d+)
EXTRACT-forth2 = b:(?<forth1>\d+)
EXTRACT-forth3 = c::(?<forth2>\d+)
EXTRACT-forth4 = d:(?<forth2>\d+)

 

 

Edit: Adding some test results with REPORT and multi-valued field. Behavior with REPORT is pretty much as as EXTRACT.

props.conf

# Tests with REPORTS

# Test-5: Behaviour within same REPORT
# Result - Like EXTRACT second REPORT will not overwrite value from first report
REPORT-first_report = first_report1, first_report2

# Test-6: Behaviour within different class
# Result - In this case also second REPORT will not overwrite value from first report
REPORT-second_report1 = second_report1
REPORT-second_report2 = second_report2

# Test-7: Behaviour with MV_ADD within same class
# Result - It will add new value to the field
REPORT-third_report = third_report1, third_report2

# Test-8: Behaviour with MV_ADD while using different class
# Result - It will add new value to the field in this case as well
REPORT-fourth_report1 = fourth_report1
REPORT-fourth_report2 = fourth_report2

# Test-9: Behaviour with MV_ADD, on skip adding MV_ADD in one of the class
# Result - The transforms.conf stanza which does not have MV_ADD in it will not be able to overwrite the value.
#      In this case, it will generate field fifth_report with two values 1 and 2
REPORT-fifth_report = fifth_report1, fifth_report2, fifth_report3

transforms.conf

[first_report1]
REGEX = a:(?<first_report>\d+)

[first_report2]
REGEX = b:(?<first_report>\d+)

[second_report1]
REGEX = a:(?<second_report>\d+)

[second_report2]
REGEX = b:(?<second_report>\d+)

[third_report1]
REGEX = a:(?<third_report>\d+)

[third_report2]
REGEX = b:(?<third_report>\d+)
MV_ADD = true

[fourth_report1]
REGEX = a:(?<fourth_report>\d+)

[fourth_report2]
REGEX = b:(?<fourth_report>\d+)
MV_ADD = true

[fifth_report1]
REGEX = a:(?<fifth_report>\d+)

[fifth_report2]
REGEX = b:(?<fifth_report>\d+)
MV_ADD = true

[fifth_report3]
REGEX = c:(?<fifth_report>\d+)
# Skipping MV_ADD here

 

View solution in original post

VatsalJagani
SplunkTrust
SplunkTrust

Please find below props.conf which describe all the scenarios with the behavior of precedence and execution.

There will be two stages:

  1. precedence - stanzas will be evaluated, duplicate parameters will be removed, etc
  2. execution - Once the precedence stage is completed, stanzas do not matter anymore.
    1. Now, EXTRACT will apply first and then EVAL. If both are extracting the same value, EVAL will override the value extracted by EXTRACT.
    2. In two EXTRACT statements, the execution sequence will be based on the class name (alphabetically). If two EXTRACT statements evaluating the same field, second will not be overwriting value extracted by the first EXTRACT.

 

Sample props.conf which I used to test all of the above concepts.

 

# Sample Event Generator:
# | makeresults | eval _raw="a:1, b:2, c:3, d:4, e:5, f:6, g:7, h:8, i:9, j:10, k:11, l:12" | collect index=main source="my_source" sourcetype="my_sourcetype" host="my_host"


[source::my_source]
# Test-1: First test is simple, if there is two parameter in source and sourcetype, parameter from source will be applied
# Result - It will only apply EVAL-first from source stanza, in the final props lines, parameter from sourcetype will not be present.
# For the proof, check the EXTRACT statement, first2 will not be extracted even though regex in sourcetype stanza for first2 is correct. Because that will not be in the final list of parameters.
EVAL-first1 = "first from source"
EXTRACT-first2 = a::(?<first2>\d+)

# Test-2: There will be two separate part, first precedence and second execution
# In this example both extract field second1 and second2 with different values
# Result - It will first apply EXTRACT a, then b, then c, then d
# And in this case it will override the value extracted by previous EXTRACT parameter
EXTRACT-a = a:(?<second1>\d+)
EXTRACT-d = c:(?<second2>\d+)



[my_sourcetype]
# Test-1
EVAL-first1 = "first from sourcetype"
EXTRACT-first2 = b:(?<first2>\d+)

# Test-2
EXTRACT-b = b:(?<second1>\d+)
EXTRACT-c = d:(?<second2>\d+)

# Test-3: Whether value extracted by EXTRACT will be overwritten by EVAL or not?
# Result - EVAL will overwrite the value extracted by EVAL
EXTRACT-e = e:(?<third>\d+)
EVAL-third = "third from eval"

# Test-4: Whether value extracted by first EXTRACT will be overwritten by second EXTRACT or not?
# Result - No. In this case forth1 and forth2 both extract field forth1, but value will be assigned by forth1
# In second case, forth3 is wrong regex not extracting any value hence value assigned by forth4 will be kept for field forth2
EXTRACT-forth1 = a:(?<forth1>\d+)
EXTRACT-forth2 = b:(?<forth1>\d+)
EXTRACT-forth3 = c::(?<forth2>\d+)
EXTRACT-forth4 = d:(?<forth2>\d+)

 

 

Edit: Adding some test results with REPORT and multi-valued field. Behavior with REPORT is pretty much as as EXTRACT.

props.conf

# Tests with REPORTS

# Test-5: Behaviour within same REPORT
# Result - Like EXTRACT second REPORT will not overwrite value from first report
REPORT-first_report = first_report1, first_report2

# Test-6: Behaviour within different class
# Result - In this case also second REPORT will not overwrite value from first report
REPORT-second_report1 = second_report1
REPORT-second_report2 = second_report2

# Test-7: Behaviour with MV_ADD within same class
# Result - It will add new value to the field
REPORT-third_report = third_report1, third_report2

# Test-8: Behaviour with MV_ADD while using different class
# Result - It will add new value to the field in this case as well
REPORT-fourth_report1 = fourth_report1
REPORT-fourth_report2 = fourth_report2

# Test-9: Behaviour with MV_ADD, on skip adding MV_ADD in one of the class
# Result - The transforms.conf stanza which does not have MV_ADD in it will not be able to overwrite the value.
#      In this case, it will generate field fifth_report with two values 1 and 2
REPORT-fifth_report = fifth_report1, fifth_report2, fifth_report3

transforms.conf

[first_report1]
REGEX = a:(?<first_report>\d+)

[first_report2]
REGEX = b:(?<first_report>\d+)

[second_report1]
REGEX = a:(?<second_report>\d+)

[second_report2]
REGEX = b:(?<second_report>\d+)

[third_report1]
REGEX = a:(?<third_report>\d+)

[third_report2]
REGEX = b:(?<third_report>\d+)
MV_ADD = true

[fourth_report1]
REGEX = a:(?<fourth_report>\d+)

[fourth_report2]
REGEX = b:(?<fourth_report>\d+)
MV_ADD = true

[fifth_report1]
REGEX = a:(?<fifth_report>\d+)

[fifth_report2]
REGEX = b:(?<fifth_report>\d+)
MV_ADD = true

[fifth_report3]
REGEX = c:(?<fifth_report>\d+)
# Skipping MV_ADD here

 

Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...