Splunk Search

How do I check for the existence of an event before indexing to avoid duplicate events?

andrewtrobec
Motivator

Hello,

I'm busy trying to find a way to ensure that duplicate records are not indexed. So far all I've managed to do is find a search that will remove duplicate values once they have been indexed (and consumed license):

index="index_name"
| eval eid=_cd
| search [ search index="index_name"
 | streamstats count by _raw
 | search count>1
 | eval eid=_cd
 | fields eid ]
| delete

Is there any way in transforms.conf or props.conf to check for the existence of an event before deciding to index?

Thank you and best regards,

Andrew

Tags (1)
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi andrewtrobec,
No there isn't any way to configure Splunk for this, Splunk already check if it already indexed a file (fishbuckets), but if you have the same event in two different files, you index it twice!

The only way that I can think (but I didn't tried to do this!) is, using SDK, to perform a search to check an event before indexing, but, as you can think, this make very slow the ingestion process, in addition what's the time period in your check search? one minute, one hour or more? there's the high risk to overload your system so the cost of the oversetting is lower than license!

Also the way to delete redundant logs it's dangerous because you risk to delete good events! probably it' should be better to dedup results at search time; remember that using "delete" command you don't save disk space because it's a logical deletion, not physical!

I think that you should check at first what's the license overload that probably it isn't so high, after you should try to re-engineer your inputs.

Bye.
Giuseppe

View solution in original post

gcusello
SplunkTrust
SplunkTrust

Hi andrewtrobec,
No there isn't any way to configure Splunk for this, Splunk already check if it already indexed a file (fishbuckets), but if you have the same event in two different files, you index it twice!

The only way that I can think (but I didn't tried to do this!) is, using SDK, to perform a search to check an event before indexing, but, as you can think, this make very slow the ingestion process, in addition what's the time period in your check search? one minute, one hour or more? there's the high risk to overload your system so the cost of the oversetting is lower than license!

Also the way to delete redundant logs it's dangerous because you risk to delete good events! probably it' should be better to dedup results at search time; remember that using "delete" command you don't save disk space because it's a logical deletion, not physical!

I think that you should check at first what's the license overload that probably it isn't so high, after you should try to re-engineer your inputs.

Bye.
Giuseppe

andrewtrobec
Motivator

@cusello Thanks for the information, very useful. Is there a way to physically delete logically deleted events?

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi andrewtrobec,
No, for my knowledge, the only way to physically delete events from an index is the "splunk clean eventdata -index index_name" command but in this way you delete the full index.

You have to wait for the retention time!
For this reason the delete command isn't a good way to delete, it's better to maintain events and dedup them at serach time!

Bye.
Giuseppe

0 Karma
Get Updates on the Splunk Community!

Splunk Observability for AI

Don’t miss out on an exciting Tech Talk on Splunk Observability for AI!Discover how Splunk’s agentic AI ...

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

Watch On Demand the Tech Talk, and empower your SOC to reach new heights! Duration: 1 hour  Prepare to ...

Splunk Observability as Code: From Zero to Dashboard

For the details on what Self-Service Observability and Observability as Code is, we have some awesome content ...