We're finding that when large files are downloaded from the Internet, the application whitelisting client reports a "new file" with a different hash multiple times as the download completes.
I essentially want to dedup the events by host, file, path but only over a very limited time window (like 5 minutes). If I use transaction to group those events and it puts the values of the file hash in a multi-value field (in time order, not sort order I believe), how do I extract just that last hash?
Or is there some way to combine dedup and bins or something like that.
Thanks.
C
How about using bucket
(aka bin
) and stats
instead?
... | bucket _time span=5m | stats last(hash) by _time
If you still want to go the transaction
/ mvfield route you could probably reach some success by using eval
's mvindex
function (an index of "-1" returns the last item in the list).
How about using bucket
(aka bin
) and stats
instead?
... | bucket _time span=5m | stats last(hash) by _time
If you still want to go the transaction
/ mvfield route you could probably reach some success by using eval
's mvindex
function (an index of "-1" returns the last item in the list).
over all events. But if you do a
... | dedup hash _time | ...
you'll dedup the combination of the fields, so in this case you'll get one hash per bucket of time.
/K
Thanks, Ayn! Quick question, if I run dedup after the bucket command, will Splunk only dedup events in each bucket or will it dedup over all events?