Splunk Search

Find consecutive duplicates in a field, by terminal

markyelland
New Member

Hi, I can't find a similar example already answered, so here goes:

The data looks like this -

    _time, Terminal, MessageNumber
   03:11, 10000023, 04
   03:04, 80000011, 03
   02:56, 10000023, 03
   02:21, 50000055, 31
   02:04, 10000023, 02
   01:27, 60000054, 48
   01:09, 10000023, 02
   01:01, 70000089, 52
   01:00, 10000023, 01
   00:59, 80000090, 29

I need to find consecutive duplicates in the 'MessageNumber' field, for each 'Terminal'. Streamstats works for individual terminals, but I've not found a method for searching all.

Would really appreciate any help!

0 Karma

FrankVl
Ultra Champion

I think this should work:

| sort Terminal,_time
| streamstats reset_on_change=true count by Terminal,MessageNumber
| where count>1

Without sorting, the streamstats reset_on_change=true doesn't work, as it also resets on changes in the Terminal value. But by sorting primarily on Terminal, that issue is gone.

0 Karma

markyelland
New Member

Thanks for this, it makes sense to sort the data first.

If I limit search to the day an example occurred, it is found, but if I add a day either side it's not. Furthermore, searching larger periods of time produce erroneous results where MessageNumber is definitely not duplicated. I've removed streamstats and checked only the sorted result to confirm this.

Any thoughts?

FYI I needed to add 0 after sort to avoid the 10k limit. I don't think this had a negative impact, but thought it was worth mentioning.

0 Karma

p_gurav
Champion

You can try something :

index=<your_index> | table _time  Terminal MessageNumber | sort _time  | streamstats current=t count by Terminal, MessageNumber | where count>1
0 Karma

FrankVl
Ultra Champion

That approach does not work, as the counter does not reset when a different messagenumber is seen for the same terminal. If for a certain terminal, you first see number 02, then 03 and then 02 again, that does not count as a duplicate, as per @markyelland his comment https://answers.splunk.com/comments/642701/view.html

0 Karma

p_gurav
Champion

Ohh!! ok got it. Thanks!!

0 Karma

FrankVl
Ultra Champion

Can you perhaps clarify what the expected output should be, based on the example you gave?

You mention "consecutive duplicates in the 'MessageNumber' field, for each 'Terminal'", so for the sample data, that would be these?

02:04, 10000023, 02
01:09, 10000023, 02

Finding duplicates is not too hard, eventstats or streamstats can help there. The difficult bit is the "consecutive". Do I understand correctly that if there would have been a 01:39, 10000023, 03 in between the above 2 events, they would not be considered as "consecutive duplicate"?

0 Karma

markyelland
New Member

Hi FrankVI, yes you're correct as 02 should only appear once in sequence. The second event from the terminal should have 03, and if so, would not be of interest.

The difficulty comes when searching all terminals. I've used streamstats count BY MessageNumber reset_on_change=true | where count >1 successfully to search individual Terminals, but my data is from thousands of devices.

Desired output might look something like this -

_time, Terminal, Message Number, Count
02:04, 10000023, 02, 2

0 Karma
Get Updates on the Splunk Community!

Observability | Use Synthetic Monitoring for Website Metadata Verification

If you are on Splunk Observability Cloud, you may already have Synthetic Monitoringin your observability ...

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...

.conf24 | Personalize your .conf experience with Learning Paths!

Personalize your .conf24 Experience Learning paths allow you to level up your skill sets and dive deeper ...