Is there a search that I could run that will repor...

rsimmons · ‎07-09-2010

I have found duplicates in the search results as identical events from the same host and same source (file) with exactly the same timestamp. Sometimes there are even more copies, as many as 5.

This is extremely annoying and messes up the statistics gathering on how may times certain functionality gets invoked within our environment.

I have seen the problem on multiple sources (files) and hosts.

gooza · ‎08-01-2011

There is a great app/python script named remove-duplicate-event-data-from-index
http://splunk-base.splunk.com/apps/22899/remove-duplicate-event-data-from-index
by zpavic that identifies and remove duplicate events -
it helped me clean my indexes,
also you can see this answer to use the script on specific dates:
http://splunk-base.splunk.com/answers/27776/is-it-possible-to-run-remove-duplicates-on-specific-date...

BobM · ‎08-01-2011

To find these events, you can run the following search

...|eventstats count as duplicate by _raw _host _time | where duplicate>1

As a temporary measure you can remove the duplicates from each search with the dedup command

...| dedup _raw _host _time

BUT this is inefficient so you need to prevent and get rid of the duplicates. If you have multiple indexers, look for data going to more than one, look for almost duplicate files, avoid using crcsalt in inputs.conf etc.

Once you have got rid of the cause, get rid of the duplicates using the following search

* | eventstats count as duplicates first(_cd) as cd by _raw host _time | where cd!=_cd

I have deliberately not joined the delete to the above search as it is good practice to check the data before deleting it. Confirm it is only bringing back duplicates and not the original then pipe to delete. You will need to temporarily add the candelete roll to your account for this to work.

...| delete

sideview · ‎07-10-2010

Sounds like there's a problem with a data input, possibly where an input is monitoring a file or directory and it believes for some reason that the entire file has changed. Is there any common pattern to the files or directories where its occurring?

As simeon says you can use the dedup command to mask it but the root cause should be fixable.

Simeon · ‎07-10-2010

The dedup command can be used to create results that do not include duplicates:

http://www.splunk.com/base/Documentation/latest/SearchReference/Dedup

Is there a search that I could run that will report 2 or more identical events?

September Community Champions: A Shoutout to Our Contributors!

Splunk Decoded: Service Maps vs Service Analyzer Tree View vs Flow Maps

What’s New in Splunk Observability – September 2025

Are you a member of the Splunk Community?

Is there a search that I could run that will report 2 or more identical events?

September Community Champions: A Shoutout to Our Contributors!

Splunk Decoded: Service Maps vs Service Analyzer Tree View vs Flow Maps

What’s New in Splunk Observability – September 2025