Archive

Find consecutive duplicates in a field, by terminal

New Member

Hi, I can't find a similar example already answered, so here goes:

The data looks like this -

    _time, Terminal, MessageNumber
   03:11, 10000023, 04
   03:04, 80000011, 03
   02:56, 10000023, 03
   02:21, 50000055, 31
   02:04, 10000023, 02
   01:27, 60000054, 48
   01:09, 10000023, 02
   01:01, 70000089, 52
   01:00, 10000023, 01
   00:59, 80000090, 29

I need to find consecutive duplicates in the 'MessageNumber' field, for each 'Terminal'. Streamstats works for individual terminals, but I've not found a method for searching all.

Would really appreciate any help!

0 Karma

Ultra Champion

I think this should work:

| sort Terminal,_time
| streamstats reset_on_change=true count by Terminal,MessageNumber
| where count>1

Without sorting, the streamstats reset_on_change=true doesn't work, as it also resets on changes in the Terminal value. But by sorting primarily on Terminal, that issue is gone.

0 Karma

New Member

Thanks for this, it makes sense to sort the data first.

If I limit search to the day an example occurred, it is found, but if I add a day either side it's not. Furthermore, searching larger periods of time produce erroneous results where MessageNumber is definitely not duplicated. I've removed streamstats and checked only the sorted result to confirm this.

Any thoughts?

FYI I needed to add 0 after sort to avoid the 10k limit. I don't think this had a negative impact, but thought it was worth mentioning.

0 Karma

Champion

You can try something :

index=<your_index> | table _time  Terminal MessageNumber | sort _time  | streamstats current=t count by Terminal, MessageNumber | where count>1
0 Karma

Ultra Champion

That approach does not work, as the counter does not reset when a different messagenumber is seen for the same terminal. If for a certain terminal, you first see number 02, then 03 and then 02 again, that does not count as a duplicate, as per @markyelland his comment https://answers.splunk.com/comments/642701/view.html

0 Karma

Champion

Ohh!! ok got it. Thanks!!

0 Karma

Ultra Champion

Can you perhaps clarify what the expected output should be, based on the example you gave?

You mention "consecutive duplicates in the 'MessageNumber' field, for each 'Terminal'", so for the sample data, that would be these?

02:04, 10000023, 02
01:09, 10000023, 02

Finding duplicates is not too hard, eventstats or streamstats can help there. The difficult bit is the "consecutive". Do I understand correctly that if there would have been a 01:39, 10000023, 03 in between the above 2 events, they would not be considered as "consecutive duplicate"?

0 Karma

New Member

Hi FrankVI, yes you're correct as 02 should only appear once in sequence. The second event from the terminal should have 03, and if so, would not be of interest.

The difficulty comes when searching all terminals. I've used streamstats count BY MessageNumber reset_on_change=true | where count >1 successfully to search individual Terminals, but my data is from thousands of devices.

Desired output might look something like this -

_time, Terminal, Message Number, Count
02:04, 10000023, 02, 2

0 Karma