Splunk Search

How to compose the output in time and Scenario_ID sequence after comparing and filter the earliest one?

Jouman
Path Finder

Hi all,

I  would like to know how to write a SPL code to solve the issue that is to pick the scenarios follow the 3 logic. 

(1) pick the Scenario_IDx whose time tag is later than its previous Scenario_IDy. (x is bigger than y)
Any Scenario_IDx whose time tag is ealier than its previous Scenario can be ignored.
Ex.
Scenario_ID1 time tag should bigger than Scenario_Start. (In Ex.1: Scenario_ID1: 103 > Scenario_Start: 101)
Scenario_ID2 time tag should smaller than Scneario_ID1 and Scenario_Start. (In Ex.1: Scenario_ID2: 104 >Scenario_Start: 101 and Scenario_ID2: 104 > Scenario_ID1: 103)

(2) If there are multiple same scenario later than previous Scenario time tag, pick the one with the earliest time tag.
Ex. Take Ex. 2 as an example.
For Scenario_ID3, pick Scenario_ID3: 204 only. 
Scenario_Start: 201 
Scenario_ID1: 202  
Scenario_ID2: 203 
Scenario_ID3: 204 
Scenario_ID3: 205 

 
 
(3) If for the Scenario_IDy, there is no Scenario_IDx later than Scenario_IDy time tag. Then no need to list anything for Scenario_IDx. (x>y)
Ex. Take Ex. 3 as an example.
All time tag of Scenario_ID5 is earlier than the one of Scenario_ID1. 
So in "Expected sequence", no need to list Scenario_ID5.

Here are the sample original scenario sequence, the corresponding information sequence and the expected scenario sequence and the corresponding information sequence as well.
Both of them are multi-value fields.

Does anyone have suggestion on SPL code to compose the "Expected sequence" and "Expected information sequence" output?
 
Example no. Original sequence (in time tag)Original information sequence (in time tag)Expected sequence (in time tag)Expected information sequence (in time tag)
1
Scenario_Start: 101
Scenario_ID1: 103
Scenario_ID1: 105
Scenario_ID2: 102
Scenario_ID2: 104
Scenario_Start_info:AAA 
Scenario_ID1_info:BBB
Scenario_ID1_info:CCC
Scenario_ID2_info:DDD
Scenario_ID2_info:EEE
Scenario_Start: 101
Scenario_ID1: 103
Scenario_ID2: 104
Scenario_Start_info:AAA 
Scenario_ID1_info:BBB
Scenario_ID2_info:EEE
2
Scenario_Start: 201 
Scenario_ID1: 202  
Scenario_ID2: 203  
Scenario_ID3: 204  
Scenario_ID3: 205 
Scenario_Start_info:AAA 
Scenario_ID1_info:BBB
Scenario_ID2_info:CCC
Scenario_ID3_info:DDD
Scenario_ID3_info:EEE
Scenario_Start: 201
Scenario_ID1: 202  
Scenario_ID2: 203  
Scenario_ID3: 204  
Scenario_Start_info:AAA 
Scenario_ID1_info:BBB
Scenario_ID2_info:CCC
Scenario_ID3_info:DDD
3
Scenario_Start: 301
Scenario_ID1: 305
Scenario_ID5: 302
Scenario_ID5: 303
Scenario_ID5: 304
Scenario_Start_info:AAA 
Scenario_ID1_info:BBB
Scenario_ID5_info:CCC
Scenario_ID5_info:DDD
Scenario_ID5_info:EEE
Scenario_Start:301
Scenario_ID1:305
Scenario_Start_info:AAA 
Scenario_ID1_info:BBB

Thank you so much.
Labels (2)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

The problem can be easier to attack if you describe it with clearer illustration of data.  I spent many, many hours trying to reverse engineer what the data look like.   Can you confirm that the following features are present in the data?  sequence is equivalent to your "Example No".

sequencescenarioidtimetaginfotag
1Scenario_Start101AAA
1Scenario_ID2102DDD
1Scenario_ID1103BBB
1Scenario_ID2104EEE
1Scenario_ID1105CCC
2Scenario_Start201AAA
2Scenario_ID1202BBB
2Scenario_ID2203CCC
2Scenario_ID3204DDD
2Scenario_ID3205EEE
3Scenario_Start301AAA
3Scenario_ID5302CCC
3Scenario_ID5303DDD
3Scenario_ID5304EEE
3Scenario_ID1305BBB

With this, and a little bit of cheating (see discussion below), I can get to desired output using some auxiliary variables.  In particular, "Scenario_Start" is assigned step value of 0 because your logic suggests that it is important for it to precede Scenario_ID1.

 

| eval step = if(scenarioid == "Scenario_Start", 0, replace(scenarioid, "Scenario_ID", ""))
| eventstats min(timetag) as stepmin by sequence step
| eval stepmin = step.":".stepmin
| eventstats values(stepmin) as stepmin dc(scenarioid) as stepcount values(step) as steps by sequence
| eval expected_min = mvindex(split(mvindex(stepmin, mvfind(steps, step) - 1), ":"), 1) ``` logic (1), (3) ```
| where step == 0 OR timetag > expected_min
| dedup scenarioid sequence ``` logic (2) ```
| fields - step* *min

 

Output is

sequencescenarioidtimetaginfotag
1Scenario_Start101AAA
1Scenario_ID1103BBB
1Scenario_ID2104EEE
2Scenario_Start201AAA
2Scenario_ID1202BBB
2Scenario_ID2203CCC
2Scenario_ID3204DDD
3Scenario_Start301AAA
3Scenario_ID1305BBB

Why do I say there's a bit of cheating?  Because the code cannot handle cases when subsequent steps have reverted timetag; for example, in sequence 2, if Scenario_ID3 has elements that precede elements of Scenario_ID2, the above code will give the wrong conclusion.  This is because I cannot find a method to dynamically update an array element.

Hope this helps.

For verification, the following is used to emulate data

 

| makeresults
| eval _raw = "scenarioid,timetag,infotag,sequence
Scenario_Start,101,AAA,1
Scenario_ID1,103,BBB,1
Scenario_ID1,105,CCC,1
Scenario_ID2,102,DDD,1
Scenario_ID2,104,EEE,1
Scenario_Start,201,AAA,2
Scenario_ID1,202,BBB,2
Scenario_ID2,203,CCC,2
Scenario_ID3,204,DDD,2
Scenario_ID3,205,EEE,2
Scenario_Start,301,AAA,3
Scenario_ID1,305,BBB,3
Scenario_ID5,302,CCC,3
Scenario_ID5,303,DDD,3
Scenario_ID5,304,EEE,3"
| multikv forceheader=1
| table sequence scenarioid timetag infotag
| sort sequence timetag
``` data emulation above ```

 

 

View solution in original post

Jouman
Path Finder

Hi Yuan,

Thank you so much!

I try the method and it works.

The original data you listed is correct.
The "Example No." field from my original data is to represent these Scenario_Start, and Scenario_IDx comes from the same experiment. Therefore, they should be analyzed together. 
It is also correct to use "sequence" from your table to categorize the data.

 
0 Karma

yuanliu
SplunkTrust
SplunkTrust

The problem can be easier to attack if you describe it with clearer illustration of data.  I spent many, many hours trying to reverse engineer what the data look like.   Can you confirm that the following features are present in the data?  sequence is equivalent to your "Example No".

sequencescenarioidtimetaginfotag
1Scenario_Start101AAA
1Scenario_ID2102DDD
1Scenario_ID1103BBB
1Scenario_ID2104EEE
1Scenario_ID1105CCC
2Scenario_Start201AAA
2Scenario_ID1202BBB
2Scenario_ID2203CCC
2Scenario_ID3204DDD
2Scenario_ID3205EEE
3Scenario_Start301AAA
3Scenario_ID5302CCC
3Scenario_ID5303DDD
3Scenario_ID5304EEE
3Scenario_ID1305BBB

With this, and a little bit of cheating (see discussion below), I can get to desired output using some auxiliary variables.  In particular, "Scenario_Start" is assigned step value of 0 because your logic suggests that it is important for it to precede Scenario_ID1.

 

| eval step = if(scenarioid == "Scenario_Start", 0, replace(scenarioid, "Scenario_ID", ""))
| eventstats min(timetag) as stepmin by sequence step
| eval stepmin = step.":".stepmin
| eventstats values(stepmin) as stepmin dc(scenarioid) as stepcount values(step) as steps by sequence
| eval expected_min = mvindex(split(mvindex(stepmin, mvfind(steps, step) - 1), ":"), 1) ``` logic (1), (3) ```
| where step == 0 OR timetag > expected_min
| dedup scenarioid sequence ``` logic (2) ```
| fields - step* *min

 

Output is

sequencescenarioidtimetaginfotag
1Scenario_Start101AAA
1Scenario_ID1103BBB
1Scenario_ID2104EEE
2Scenario_Start201AAA
2Scenario_ID1202BBB
2Scenario_ID2203CCC
2Scenario_ID3204DDD
3Scenario_Start301AAA
3Scenario_ID1305BBB

Why do I say there's a bit of cheating?  Because the code cannot handle cases when subsequent steps have reverted timetag; for example, in sequence 2, if Scenario_ID3 has elements that precede elements of Scenario_ID2, the above code will give the wrong conclusion.  This is because I cannot find a method to dynamically update an array element.

Hope this helps.

For verification, the following is used to emulate data

 

| makeresults
| eval _raw = "scenarioid,timetag,infotag,sequence
Scenario_Start,101,AAA,1
Scenario_ID1,103,BBB,1
Scenario_ID1,105,CCC,1
Scenario_ID2,102,DDD,1
Scenario_ID2,104,EEE,1
Scenario_Start,201,AAA,2
Scenario_ID1,202,BBB,2
Scenario_ID2,203,CCC,2
Scenario_ID3,204,DDD,2
Scenario_ID3,205,EEE,2
Scenario_Start,301,AAA,3
Scenario_ID1,305,BBB,3
Scenario_ID5,302,CCC,3
Scenario_ID5,303,DDD,3
Scenario_ID5,304,EEE,3"
| multikv forceheader=1
| table sequence scenarioid timetag infotag
| sort sequence timetag
``` data emulation above ```

 

 

Get Updates on the Splunk Community!

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer at Splunk .conf24 ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...

Share Your Ideas & Meet the Lantern team at .Conf! Plus All of This Month’s New ...

Splunk Lantern is Splunk’s customer success center that provides advice from Splunk experts on valuable data ...