Splunk Dev

How to delete some duplicated events automatically?

jmcr
New Member

Hello all,

I need to delete duplicated events, since one of my data sources sends duplicated events, there is a field "id" and also a field "version" so I can identify the last one in order to keep it and delete the others. I need this process to run automatically every hour for example. Any suggestions?

Thanks in advance

Labels (2)
0 Karma

Richfez
SplunkTrust
SplunkTrust

I don't know if that would be wise.  It certainly might have unintended conseqences.

But, there are solutions.

Is there any way to fix the sending device to have it not do duplicates? 

If that's not possible, then to "delete" the others, you could do one of a few things.

You could set up a summary index, possibly, and send only ... no, because they'd still be duplicated - you'd have to clean out that summary index regularly.  Hmm.

You could build a lookup out of those, if they're not too big, and just overwrite it with only the 'best current values' every hour as a scheduled report doing an outputlookup at the end.

But, easiest, is probably just work around it in your SPL, perhaps a subsearch is easiest.

index=foo sourcetype=bar 
    [ index=foo sourcetype=bar 
    | stats max(version) as version by id 
    | fields version id ]

If you haven't dealt with subsearches before, ... well, they're pretty useful at times.

The subsearch is inside the [] brackets, and *it runs first*.  Once it completes, it returns its results back into the main search (formatted by default with () and AND and OR and whatnot).  Then it's part of the main search's search terms.

Like this example for my little firewall.  My APs run different versions of software (because I upgrade one of the two, and usually wait a few days before upgrading the other.).  If I wanted to only get records where the version was on the latest, I could do the following:

index=fw 
    [ search index=fw 
    | stats max(host_version) as host_version by host 
    | fields host_version host ] 

The subsearch runs and ends up returning a list like

( ( host="AP_Downstairs" AND host_version="v4.3.21.11325" ) OR ( host="AP_Upstairs" AND host_version="v4.3.21.11325" ) OR ( host="curie" ) )

THAT search then gets appended right into the main search, so your full search resolves down to

index=fw ( ( host="AP_Downstairs" AND host_version="v4.3.21.11325" ) OR ( host="AP_Upstairs" AND host_version="v4.3.21.11325" ) OR ( host="curie" ) )

If you ever need a different set of AND/OR/() or things grouped differently, there's a 'format' command you can use, it's a little obtuse but look at the examples. https://docs.splunk.com/Documentation/Splunk/8.0.6/SearchReference/Format

If you search for Splunk subsearches, you'll find all sorts of help on them.  Here's a good set of starting points:

The search tutorial's examples: https://docs.splunk.com/Documentation/Splunk/8.0.6/SearchTutorial/Useasubsearch

And about subsearches: https://docs.splunk.com/Documentation/Splunk/8.0.6/Search/Aboutsubsearches

 

happy splunking!

-Rich

richgalloway
SplunkTrust
SplunkTrust

Are you saying the events for a given "id" field have different "version" field values?  If so, then they are not really duplicate events.

If you still want to get rid of them then there may be other ways to do so besides | delete (which doesn't actually delete anything).

If there is a way to use a regular expression to identify the duplicate events then using a transform to send the unwanted events to nullQueue is better because you are not using license quota for events that will never be seen.

Failing that, then using delete may be the final option.  Create a scheduled search that runs every few minutes, looks at the previous few minutes for duplicates, and deletes them.  The schedule search must be owned by a user that has the "can_delete" role.  Do NOT use this user for any other activity or you risk other data being deleted.  CAUTION: Here There Be Dragons.  I doubt an auditor will approve of this procedure so use it only if you are not subject to audits.

---
If this reply helps you, Karma would be appreciated.
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...