Hi all,
I would like to know how to write a SPL code to solve the issue that is to pick the scenarios follow the 3 logic.
(1) pick the Scenario_IDx whose time tag is later than its previous Scenario_IDy. (x is bigger than y)
Any Scenario_IDx whose time tag is ealier than its previous Scenario can be ignored.
Ex.
Scenario_ID1 time tag should bigger than Scenario_Start. (In Ex.1: Scenario_ID1: 103 > Scenario_Start: 101)
Scenario_ID2 time tag should smaller than Scneario_ID1 and Scenario_Start. (In Ex.1: Scenario_ID2: 104 >Scenario_Start: 101 and Scenario_ID2: 104 > Scenario_ID1: 103)
(2) If there are multiple same scenario later than previous Scenario time tag, pick the one with the earliest time tag.
Ex. Take Ex. 2 as an example.
For Scenario_ID3, pick Scenario_ID3: 204 only.
Scenario_Start: 201
Scenario_ID1: 202
Scenario_ID2: 203
Scenario_ID3: 204
Scenario_ID3: 205
Example no. | Original sequence (in time tag) | Original information sequence (in time tag) | Expected sequence (in time tag) | Expected information sequence (in time tag) |
1 | Scenario_Start: 101 Scenario_ID1: 103 Scenario_ID1: 105 Scenario_ID2: 102 Scenario_ID2: 104 | Scenario_Start_info:AAA Scenario_ID1_info:BBB Scenario_ID1_info:CCC Scenario_ID2_info:DDD Scenario_ID2_info:EEE | Scenario_Start: 101 Scenario_ID1: 103 Scenario_ID2: 104 | Scenario_Start_info:AAA Scenario_ID1_info:BBB Scenario_ID2_info:EEE |
2 | Scenario_Start: 201 Scenario_ID1: 202 Scenario_ID2: 203 Scenario_ID3: 204 Scenario_ID3: 205 | Scenario_Start_info:AAA Scenario_ID1_info:BBB Scenario_ID2_info:CCC Scenario_ID3_info:DDD Scenario_ID3_info:EEE | Scenario_Start: 201 Scenario_ID1: 202 Scenario_ID2: 203 Scenario_ID3: 204 | Scenario_Start_info:AAA Scenario_ID1_info:BBB Scenario_ID2_info:CCC Scenario_ID3_info:DDD |
3 | Scenario_Start: 301 Scenario_ID1: 305 Scenario_ID5: 302 Scenario_ID5: 303 Scenario_ID5: 304 | Scenario_Start_info:AAA Scenario_ID1_info:BBB Scenario_ID5_info:CCC Scenario_ID5_info:DDD Scenario_ID5_info:EEE | Scenario_Start:301 Scenario_ID1:305 | Scenario_Start_info:AAA Scenario_ID1_info:BBB |
The problem can be easier to attack if you describe it with clearer illustration of data. I spent many, many hours trying to reverse engineer what the data look like. Can you confirm that the following features are present in the data? sequence is equivalent to your "Example No".
sequence | scenarioid | timetag | infotag |
1 | Scenario_Start | 101 | AAA |
1 | Scenario_ID2 | 102 | DDD |
1 | Scenario_ID1 | 103 | BBB |
1 | Scenario_ID2 | 104 | EEE |
1 | Scenario_ID1 | 105 | CCC |
2 | Scenario_Start | 201 | AAA |
2 | Scenario_ID1 | 202 | BBB |
2 | Scenario_ID2 | 203 | CCC |
2 | Scenario_ID3 | 204 | DDD |
2 | Scenario_ID3 | 205 | EEE |
3 | Scenario_Start | 301 | AAA |
3 | Scenario_ID5 | 302 | CCC |
3 | Scenario_ID5 | 303 | DDD |
3 | Scenario_ID5 | 304 | EEE |
3 | Scenario_ID1 | 305 | BBB |
With this, and a little bit of cheating (see discussion below), I can get to desired output using some auxiliary variables. In particular, "Scenario_Start" is assigned step value of 0 because your logic suggests that it is important for it to precede Scenario_ID1.
| eval step = if(scenarioid == "Scenario_Start", 0, replace(scenarioid, "Scenario_ID", ""))
| eventstats min(timetag) as stepmin by sequence step
| eval stepmin = step.":".stepmin
| eventstats values(stepmin) as stepmin dc(scenarioid) as stepcount values(step) as steps by sequence
| eval expected_min = mvindex(split(mvindex(stepmin, mvfind(steps, step) - 1), ":"), 1) ``` logic (1), (3) ```
| where step == 0 OR timetag > expected_min
| dedup scenarioid sequence ``` logic (2) ```
| fields - step* *min
Output is
sequence | scenarioid | timetag | infotag |
1 | Scenario_Start | 101 | AAA |
1 | Scenario_ID1 | 103 | BBB |
1 | Scenario_ID2 | 104 | EEE |
2 | Scenario_Start | 201 | AAA |
2 | Scenario_ID1 | 202 | BBB |
2 | Scenario_ID2 | 203 | CCC |
2 | Scenario_ID3 | 204 | DDD |
3 | Scenario_Start | 301 | AAA |
3 | Scenario_ID1 | 305 | BBB |
Why do I say there's a bit of cheating? Because the code cannot handle cases when subsequent steps have reverted timetag; for example, in sequence 2, if Scenario_ID3 has elements that precede elements of Scenario_ID2, the above code will give the wrong conclusion. This is because I cannot find a method to dynamically update an array element.
Hope this helps.
For verification, the following is used to emulate data
| makeresults
| eval _raw = "scenarioid,timetag,infotag,sequence
Scenario_Start,101,AAA,1
Scenario_ID1,103,BBB,1
Scenario_ID1,105,CCC,1
Scenario_ID2,102,DDD,1
Scenario_ID2,104,EEE,1
Scenario_Start,201,AAA,2
Scenario_ID1,202,BBB,2
Scenario_ID2,203,CCC,2
Scenario_ID3,204,DDD,2
Scenario_ID3,205,EEE,2
Scenario_Start,301,AAA,3
Scenario_ID1,305,BBB,3
Scenario_ID5,302,CCC,3
Scenario_ID5,303,DDD,3
Scenario_ID5,304,EEE,3"
| multikv forceheader=1
| table sequence scenarioid timetag infotag
| sort sequence timetag
``` data emulation above ```
Hi Yuan,
Thank you so much!
I try the method and it works.
The original data you listed is correct.
The "Example No." field from my original data is to represent these Scenario_Start, and Scenario_IDx comes from the same experiment. Therefore, they should be analyzed together.
It is also correct to use "sequence" from your table to categorize the data.
The problem can be easier to attack if you describe it with clearer illustration of data. I spent many, many hours trying to reverse engineer what the data look like. Can you confirm that the following features are present in the data? sequence is equivalent to your "Example No".
sequence | scenarioid | timetag | infotag |
1 | Scenario_Start | 101 | AAA |
1 | Scenario_ID2 | 102 | DDD |
1 | Scenario_ID1 | 103 | BBB |
1 | Scenario_ID2 | 104 | EEE |
1 | Scenario_ID1 | 105 | CCC |
2 | Scenario_Start | 201 | AAA |
2 | Scenario_ID1 | 202 | BBB |
2 | Scenario_ID2 | 203 | CCC |
2 | Scenario_ID3 | 204 | DDD |
2 | Scenario_ID3 | 205 | EEE |
3 | Scenario_Start | 301 | AAA |
3 | Scenario_ID5 | 302 | CCC |
3 | Scenario_ID5 | 303 | DDD |
3 | Scenario_ID5 | 304 | EEE |
3 | Scenario_ID1 | 305 | BBB |
With this, and a little bit of cheating (see discussion below), I can get to desired output using some auxiliary variables. In particular, "Scenario_Start" is assigned step value of 0 because your logic suggests that it is important for it to precede Scenario_ID1.
| eval step = if(scenarioid == "Scenario_Start", 0, replace(scenarioid, "Scenario_ID", ""))
| eventstats min(timetag) as stepmin by sequence step
| eval stepmin = step.":".stepmin
| eventstats values(stepmin) as stepmin dc(scenarioid) as stepcount values(step) as steps by sequence
| eval expected_min = mvindex(split(mvindex(stepmin, mvfind(steps, step) - 1), ":"), 1) ``` logic (1), (3) ```
| where step == 0 OR timetag > expected_min
| dedup scenarioid sequence ``` logic (2) ```
| fields - step* *min
Output is
sequence | scenarioid | timetag | infotag |
1 | Scenario_Start | 101 | AAA |
1 | Scenario_ID1 | 103 | BBB |
1 | Scenario_ID2 | 104 | EEE |
2 | Scenario_Start | 201 | AAA |
2 | Scenario_ID1 | 202 | BBB |
2 | Scenario_ID2 | 203 | CCC |
2 | Scenario_ID3 | 204 | DDD |
3 | Scenario_Start | 301 | AAA |
3 | Scenario_ID1 | 305 | BBB |
Why do I say there's a bit of cheating? Because the code cannot handle cases when subsequent steps have reverted timetag; for example, in sequence 2, if Scenario_ID3 has elements that precede elements of Scenario_ID2, the above code will give the wrong conclusion. This is because I cannot find a method to dynamically update an array element.
Hope this helps.
For verification, the following is used to emulate data
| makeresults
| eval _raw = "scenarioid,timetag,infotag,sequence
Scenario_Start,101,AAA,1
Scenario_ID1,103,BBB,1
Scenario_ID1,105,CCC,1
Scenario_ID2,102,DDD,1
Scenario_ID2,104,EEE,1
Scenario_Start,201,AAA,2
Scenario_ID1,202,BBB,2
Scenario_ID2,203,CCC,2
Scenario_ID3,204,DDD,2
Scenario_ID3,205,EEE,2
Scenario_Start,301,AAA,3
Scenario_ID1,305,BBB,3
Scenario_ID5,302,CCC,3
Scenario_ID5,303,DDD,3
Scenario_ID5,304,EEE,3"
| multikv forceheader=1
| table sequence scenarioid timetag infotag
| sort sequence timetag
``` data emulation above ```