Hi, I can't find a similar example already answered, so here goes:
The data looks like this -
_time, Terminal, MessageNumber 03:11, 10000023, 04 03:04, 80000011, 03 02:56, 10000023, 03 02:21, 50000055, 31 02:04, 10000023, 02 01:27, 60000054, 48 01:09, 10000023, 02 01:01, 70000089, 52 01:00, 10000023, 01 00:59, 80000090, 29
I need to find consecutive duplicates in the 'MessageNumber' field, for each 'Terminal'. Streamstats works for individual terminals, but I've not found a method for searching all.
Would really appreciate any help!
I think this should work:
| sort Terminal,_time | streamstats reset_on_change=true count by Terminal,MessageNumber | where count>1
Without sorting, the
streamstats reset_on_change=true doesn't work, as it also resets on changes in the Terminal value. But by sorting primarily on Terminal, that issue is gone.
Thanks for this, it makes sense to sort the data first.
If I limit search to the day an example occurred, it is found, but if I add a day either side it's not. Furthermore, searching larger periods of time produce erroneous results where MessageNumber is definitely not duplicated. I've removed streamstats and checked only the sorted result to confirm this.
FYI I needed to add 0 after sort to avoid the 10k limit. I don't think this had a negative impact, but thought it was worth mentioning.
That approach does not work, as the counter does not reset when a different messagenumber is seen for the same terminal. If for a certain terminal, you first see number 02, then 03 and then 02 again, that does not count as a duplicate, as per @markyelland his comment https://answers.splunk.com/comments/642701/view.html
Can you perhaps clarify what the expected output should be, based on the example you gave?
You mention "consecutive duplicates in the 'MessageNumber' field, for each 'Terminal'", so for the sample data, that would be these?
02:04, 10000023, 02 01:09, 10000023, 02
Finding duplicates is not too hard, eventstats or streamstats can help there. The difficult bit is the "consecutive". Do I understand correctly that if there would have been a
01:39, 10000023, 03 in between the above 2 events, they would not be considered as "consecutive duplicate"?
Hi FrankVI, yes you're correct as 02 should only appear once in sequence. The second event from the terminal should have 03, and if so, would not be of interest.
The difficulty comes when searching all terminals. I've used
streamstats count BY MessageNumber reset_on_change=true | where count >1 successfully to search individual Terminals, but my data is from thousands of devices.
Desired output might look something like this -
_time, Terminal, Message Number, Count
02:04, 10000023, 02, 2