Solved: How to compose the output in time and Scenario_ID ...

Jouman · ‎11-23-2022

Hi all,

I would like to know how to write a SPL code to solve the issue that is to pick the scenarios follow the 3 logic.

(1) pick the Scenario_IDx whose time tag is later than its previous Scenario_IDy. (x is bigger than y)
Any Scenario_IDx whose time tag is ealier than its previous Scenario can be ignored.
Ex.
Scenario_ID1 time tag should bigger than Scenario_Start. (In Ex.1: Scenario_ID1: 103 > Scenario_Start: 101)
Scenario_ID2 time tag should smaller than Scneario_ID1 and Scenario_Start. (In Ex.1: Scenario_ID2: 104 >Scenario_Start: 101 and Scenario_ID2: 104 > Scenario_ID1: 103)

(2) If there are multiple same scenario later than previous Scenario time tag, pick the one with the earliest time tag.
Ex. Take Ex. 2 as an example.
For Scenario_ID3, pick Scenario_ID3: 204 only.
Scenario_Start: 201
Scenario_ID1: 202
Scenario_ID2: 203
Scenario_ID3: 204
Scenario_ID3: 205

(3) If for the Scenario_IDy, there is no Scenario_IDx later than Scenario_IDy time tag. Then no need to list anything for Scenario_IDx. (x>y)
Ex. Take Ex. 3 as an example.
All time tag of Scenario_ID5 is earlier than the one of Scenario_ID1.
So in "Expected sequence", no need to list Scenario_ID5.

Here are the sample original scenario sequence, the corresponding information sequence and the expected scenario sequence and the corresponding information sequence as well.
Both of them are multi-value fields.

Does anyone have suggestion on SPL code to compose the "Expected sequence" and "Expected information sequence" output?

Example no.	Original sequence (in time tag)	Original information sequence (in time tag)	Expected sequence (in time tag)	Expected information sequence (in time tag)
1	Scenario_Start: 101 Scenario_ID1: 103 Scenario_ID1: 105 Scenario_ID2: 102 Scenario_ID2: 104	Scenario_Start_info:AAA Scenario_ID1_info:BBB Scenario_ID1_info:CCC Scenario_ID2_info:DDD Scenario_ID2_info:EEE	Scenario_Start: 101 Scenario_ID1: 103 Scenario_ID2: 104	Scenario_Start_info:AAA Scenario_ID1_info:BBB Scenario_ID2_info:EEE
2	Scenario_Start: 201 Scenario_ID1: 202 Scenario_ID2: 203 Scenario_ID3: 204 Scenario_ID3: 205	Scenario_Start_info:AAA Scenario_ID1_info:BBB Scenario_ID2_info:CCC Scenario_ID3_info:DDD Scenario_ID3_info:EEE	Scenario_Start: 201 Scenario_ID1: 202 Scenario_ID2: 203 Scenario_ID3: 204	Scenario_Start_info:AAA Scenario_ID1_info:BBB Scenario_ID2_info:CCC Scenario_ID3_info:DDD
3	Scenario_Start: 301 Scenario_ID1: 305 Scenario_ID5: 302 Scenario_ID5: 303 Scenario_ID5: 304	Scenario_Start_info:AAA Scenario_ID1_info:BBB Scenario_ID5_info:CCC Scenario_ID5_info:DDD Scenario_ID5_info:EEE	Scenario_Start:301 Scenario_ID1:305	Scenario_Start_info:AAA Scenario_ID1_info:BBB

Thank you so much.

yuanliu · ‎11-26-2022

The problem can be easier to attack if you describe it with clearer illustration of data. I spent many, many hours trying to reverse engineer what the data look like. Can you confirm that the following features are present in the data? sequence is equivalent to your "Example No".

sequence	scenarioid	timetag	infotag
1	Scenario_Start	101	AAA
1	Scenario_ID2	102	DDD
1	Scenario_ID1	103	BBB
1	Scenario_ID2	104	EEE
1	Scenario_ID1	105	CCC
2	Scenario_Start	201	AAA
2	Scenario_ID1	202	BBB
2	Scenario_ID2	203	CCC
2	Scenario_ID3	204	DDD
2	Scenario_ID3	205	EEE
3	Scenario_Start	301	AAA
3	Scenario_ID5	302	CCC
3	Scenario_ID5	303	DDD
3	Scenario_ID5	304	EEE
3	Scenario_ID1	305	BBB

With this, and a little bit of cheating (see discussion below), I can get to desired output using some auxiliary variables. In particular, "Scenario_Start" is assigned step value of 0 because your logic suggests that it is important for it to precede Scenario_ID1.

| eval step = if(scenarioid == "Scenario_Start", 0, replace(scenarioid, "Scenario_ID", ""))
| eventstats min(timetag) as stepmin by sequence step
| eval stepmin = step.":".stepmin
| eventstats values(stepmin) as stepmin dc(scenarioid) as stepcount values(step) as steps by sequence
| eval expected_min = mvindex(split(mvindex(stepmin, mvfind(steps, step) - 1), ":"), 1) ``` logic (1), (3) ```
| where step == 0 OR timetag > expected_min
| dedup scenarioid sequence ``` logic (2) ```
| fields - step* *min

Output is

sequence	scenarioid	timetag	infotag
1	Scenario_Start	101	AAA
1	Scenario_ID1	103	BBB
1	Scenario_ID2	104	EEE
2	Scenario_Start	201	AAA
2	Scenario_ID1	202	BBB
2	Scenario_ID2	203	CCC
2	Scenario_ID3	204	DDD
3	Scenario_Start	301	AAA
3	Scenario_ID1	305	BBB

Why do I say there's a bit of cheating? Because the code cannot handle cases when subsequent steps have reverted timetag; for example, in sequence 2, if Scenario_ID3 has elements that precede elements of Scenario_ID2, the above code will give the wrong conclusion. This is because I cannot find a method to dynamically update an array element.

Hope this helps.

For verification, the following is used to emulate data

| makeresults
| eval _raw = "scenarioid,timetag,infotag,sequence
Scenario_Start,101,AAA,1
Scenario_ID1,103,BBB,1
Scenario_ID1,105,CCC,1
Scenario_ID2,102,DDD,1
Scenario_ID2,104,EEE,1
Scenario_Start,201,AAA,2
Scenario_ID1,202,BBB,2
Scenario_ID2,203,CCC,2
Scenario_ID3,204,DDD,2
Scenario_ID3,205,EEE,2
Scenario_Start,301,AAA,3
Scenario_ID1,305,BBB,3
Scenario_ID5,302,CCC,3
Scenario_ID5,303,DDD,3
Scenario_ID5,304,EEE,3"
| multikv forceheader=1
| table sequence scenarioid timetag infotag
| sort sequence timetag
``` data emulation above ```

View solution in original post

Jouman · ‎11-29-2022

Hi Yuan,

Thank you so much!

I try the method and it works.

The original data you listed is correct.
The "Example No." field from my original data is to represent these Scenario_Start, and Scenario_IDx comes from the same experiment. Therefore, they should be analyzed together.
It is also correct to use "sequence" from your table to categorize the data.

yuanliu · ‎11-26-2022

The problem can be easier to attack if you describe it with clearer illustration of data. I spent many, many hours trying to reverse engineer what the data look like. Can you confirm that the following features are present in the data? sequence is equivalent to your "Example No".

sequence	scenarioid	timetag	infotag
1	Scenario_Start	101	AAA
1	Scenario_ID2	102	DDD
1	Scenario_ID1	103	BBB
1	Scenario_ID2	104	EEE
1	Scenario_ID1	105	CCC
2	Scenario_Start	201	AAA
2	Scenario_ID1	202	BBB
2	Scenario_ID2	203	CCC
2	Scenario_ID3	204	DDD
2	Scenario_ID3	205	EEE
3	Scenario_Start	301	AAA
3	Scenario_ID5	302	CCC
3	Scenario_ID5	303	DDD
3	Scenario_ID5	304	EEE
3	Scenario_ID1	305	BBB

With this, and a little bit of cheating (see discussion below), I can get to desired output using some auxiliary variables. In particular, "Scenario_Start" is assigned step value of 0 because your logic suggests that it is important for it to precede Scenario_ID1.

| eval step = if(scenarioid == "Scenario_Start", 0, replace(scenarioid, "Scenario_ID", ""))
| eventstats min(timetag) as stepmin by sequence step
| eval stepmin = step.":".stepmin
| eventstats values(stepmin) as stepmin dc(scenarioid) as stepcount values(step) as steps by sequence
| eval expected_min = mvindex(split(mvindex(stepmin, mvfind(steps, step) - 1), ":"), 1) ``` logic (1), (3) ```
| where step == 0 OR timetag > expected_min
| dedup scenarioid sequence ``` logic (2) ```
| fields - step* *min

Output is

sequence	scenarioid	timetag	infotag
1	Scenario_Start	101	AAA
1	Scenario_ID1	103	BBB
1	Scenario_ID2	104	EEE
2	Scenario_Start	201	AAA
2	Scenario_ID1	202	BBB
2	Scenario_ID2	203	CCC
2	Scenario_ID3	204	DDD
3	Scenario_Start	301	AAA
3	Scenario_ID1	305	BBB

Why do I say there's a bit of cheating? Because the code cannot handle cases when subsequent steps have reverted timetag; for example, in sequence 2, if Scenario_ID3 has elements that precede elements of Scenario_ID2, the above code will give the wrong conclusion. This is because I cannot find a method to dynamically update an array element.

Hope this helps.

For verification, the following is used to emulate data

| makeresults
| eval _raw = "scenarioid,timetag,infotag,sequence
Scenario_Start,101,AAA,1
Scenario_ID1,103,BBB,1
Scenario_ID1,105,CCC,1
Scenario_ID2,102,DDD,1
Scenario_ID2,104,EEE,1
Scenario_Start,201,AAA,2
Scenario_ID1,202,BBB,2
Scenario_ID2,203,CCC,2
Scenario_ID3,204,DDD,2
Scenario_ID3,205,EEE,2
Scenario_Start,301,AAA,3
Scenario_ID1,305,BBB,3
Scenario_ID5,302,CCC,3
Scenario_ID5,303,DDD,3
Scenario_ID5,304,EEE,3"
| multikv forceheader=1
| table sequence scenarioid timetag infotag
| sort sequence timetag
``` data emulation above ```

How to compose the output in time and Scenario_ID sequence after comparing and filter the earliest one?

chart

table

AppDynamics Summer Webinars

SOCin’ it to you at Splunk University

Credit Card Data Protection & PCI Compliance with Splunk Edge Processor

Are you a member of the Splunk Community?