Splunk Search

Applying "dedup" in a rolling time window?

dbryan
Path Finder

I want to deduplicate some events within a time period, but it's a rolling 24-hour frame so I can't just go off of one of the date fields. The only way I've figured out so far is to use transaction, but a transaction is a very expensive operation for something as trivial as this, and I then also have the processing & cognitive overhead of selecting the correct values from the multi-valued fields I end up with.

Tags (2)
0 Karma

Ayn
Legend

I'm not sure I got your requirement right, so let me know if I misinterpreted your question. As I understand it you have a search running for a rolling 24-hour frame, and you want to make sure that certain events do not show up more than once - not sure with regards to what though, time and some value for a specific field? Anyway, if that's the case, let's say you want to have only unique values for the field myfield within some chosen time period, say one hour. I imagine this would do the trick:

... | bucket _time span=1h | dedup myfield _time
0 Karma

Ayn
Legend

Well yes, what bucket does is precisely to divide the time into discrete sets of buckets, so an event either ends up in one bucket (with regards to _time) or in another. Off the top of my head I don't know a way to handle this kind of situation (other than using transaction).

0 Karma

dbryan
Path Finder

That's pretty much my problem, yes, but the problem is that the time period is rolling. My understanding is that if I aggregate events into a bucket, events in the last 5 minutes of the bucket would not be deduplicated against events in the first 5 minutes of the subsequent bucket. At the moment I'm doing transaction mykey maxspan=24.

0 Karma
.conf21 Now Fully Virtual!
Register for FREE Today!

We've made .conf21 totally virtual and totally FREE! Our completely online experience will run from 10/19 through 10/20 with some additional events, too!