I would like to create a table that displays the first and last event from a duplicate set of events. A duplicate run may be interrupted by a non-duplicate event in which case I also want to the display the first and last message when the duplicate events appear again.
For example, consider this list of events
| makeresults count=10
| streamstats count
| eval _time=1599978591-(count*60)
| eval Message = case(count=10, "MessageA", count=9, "MessageA", count=8, "MessageA", count=7, "MessageB", count=6, "MessageB", count=5, "MessageB", count=4, "MessageA", count=3, "MessageA", count=2, "MessageA", count=1, "MessageB")
| reverse
| table _time, Message
which will create a table similar to:
_time | Message |
2020-09-13 06:19:51 | MessageA |
2020-09-13 06:20:51 | MessageA |
2020-09-13 06:21:51 | MessageA |
2020-09-13 06:22:51 | MessageB |
2020-09-13 06:23:51 | MessageB |
2020-09-13 06:24:51 | MessageB |
2020-09-13 06:25:51 | MessageA |
2020-09-13 06:26:51 | MessageA |
2020-09-13 06:27:51 | MessageA |
2020-09-13 06:28:51 | MessageB |
Now, I would like to display the first and last time a message is seen, removing any duplicates.. but it needs to consider that the same message may be seen again in another sequence and should be considered a different run to be displayed.
I thought of adding this
| stats earliest(_time) as Earliest, latest(_time) as Latest BY Message
| eval FirstEvent=strftime(Earliest,"%+"), LastEvent=strftime(Latest,"%+")
| table FirstEvent, LastEvent, Message
but it doesn't take into account that there may be multiple sequences of repeated events so it generates
FirstEvent | LastEvent | Message |
Sun Sep 13 06:19:51 UTC 2020 | Sun Sep 13 06:19:51 UTC 2020 | MessageA |
Sun Sep 13 06:22:51 UTC 2020 | Sun Sep 13 06:22:51 UTC 2020 | MessageB |
But what I desire is the following output
FirstEvent | LastEvent | Message |
Sun Sep 13 06:19:51 UTC 2020 | Sun Sep 13 06:21:51 UTC 2020 | MessageA |
Sun Sep 13 06:22:51 UTC 2020 | Sun Sep 13 06:24:51 UTC 2020 | MessageB |
Sun Sep 13 06:25:51 UTC 2020 | Sun Sep 13 06:27:51 UTC 2020 | MessageA |
Sun Sep 13 06:28:51 UTC 2020 | Sun Sep 13 06:28:51 UTC 2020 | MessageB |
Any help is extremely appreciated. Thanks in advance.
Roger Cruz
you are not only the one who struggles with streamstas.
| makeresults count=10
| streamstats count
| eval _time=1599978591-(count*60)
| eval Message = case(count=10, "MessageA", count=9, "MessageA", count=8, "MessageA", count=7, "MessageB", count=6, "MessageB", count=5, "MessageB", count=4, "MessageA", count=3, "MessageA", count=2, "MessageA", count=1, "MessageB")
| reverse
| table _time, Message
| streamstats last(Message) as prev_Message by Message current=false global=false reset_on_change=true # finding first event of Message with the help of current=false - this will ignore first match in sequence ( for example MessageA, MessageA, MessageA) and reset_on_change=true - This will reset function it can be last , count, or any aggregated function when it encounters different value for a field given in by clause
| eval prev_Message=if(isnull(prev_Message),"first",null())
| sort 0 - _time # reversing events so last event will be first after this command execution
| streamstats last(Message) as prev_Message_1 by Message current=false global=false reset_on_change=true # performing same streamstats this time it will last event
| sort 0 _time
| eval prev_Message_1=if(isnull(prev_Message_1) AND (isnull(prev_Message) OR prev_Message=="first"),"last",null())
| where isnotnull(prev_Message) OR isnotnull(prev_Message_1)
| eval when=coalesce(prev_Message,prev_Message_1)
| fields - prev*
| xyseries Message when _time grouped=true
| convert ctime(first) ctime(last)
| eval last=if(isnull(last),first,last)
check some more info at below link:
https://www.youtube.com/watch?v=xzgFTqSFx7w
little heavy, you can give a try
| makeresults count=10
| streamstats count
| eval _time=1599978591-(count*60)
| eval Message = case(count=10, "MessageA", count=9, "MessageA", count=8, "MessageA", count=7, "MessageB", count=6, "MessageB", count=5, "MessageB", count=4, "MessageA", count=3, "MessageA", count=2, "MessageA", count=1, "MessageB")
| reverse
| table _time, Message
| streamstats last(Message) as prev_Message by Message current=false global=false reset_on_change=true
| eval prev_Message=if(isnull(prev_Message),"first",null())
| sort 0 - _time
| streamstats last(Message) as prev_Message_1 by Message current=false global=false reset_on_change=true
| sort 0 _time
| eval prev_Message_1=if(isnull(prev_Message_1) AND (isnull(prev_Message) OR prev_Message=="first"),"last",null())
| where isnotnull(prev_Message) OR isnotnull(prev_Message_1)
| eval when=coalesce(prev_Message,prev_Message_1)
| fields - prev*
| xyseries Message when _time grouped=true
| convert ctime(first) ctime(last)
| eval last=if(isnull(last),first,last)
@thambisetty
I must admit that I need a Splunk PhD to understand that query but it seems to work! I'm glad my confusing request was understood. Thank you so much!
Could you explain the logic behind this query in layman's terms?
I tried working with dedup and stats but nothing I came up with could deal with the fact that a repeating sequence could be interrupted and I needed to then reset the start + end. My Google search magic also failed me in finding any other person asking the same. Am I the first one to ever find to find out the first + last events of repeating sequences?