Splunk Search

Is there a search that I could run that will report 2 or more identical events?

rsimmons
Splunk Employee
Splunk Employee

I have found duplicates in the search results as identical events from the same host and same source (file) with exactly the same timestamp. Sometimes there are even more copies, as many as 5.

This is extremely annoying and messes up the statistics gathering on how may times certain functionality gets invoked within our environment.

I have seen the problem on multiple sources (files) and hosts.

Tags (1)

gooza
Communicator

There is a great app/python script named remove-duplicate-event-data-from-index
http://splunk-base.splunk.com/apps/22899/remove-duplicate-event-data-from-index
by zpavic that identifies and remove duplicate events -
it helped me clean my indexes,
also you can see this answer to use the script on specific dates:
http://splunk-base.splunk.com/answers/27776/is-it-possible-to-run-remove-duplicates-on-specific-date...

0 Karma

BobM
Builder

To find these events, you can run the following search

...|eventstats count as duplicate by _raw _host _time | where duplicate>1 

As a temporary measure you can remove the duplicates from each search with the dedup command

...| dedup _raw _host _time

BUT this is inefficient so you need to prevent and get rid of the duplicates. If you have multiple indexers, look for data going to more than one, look for almost duplicate files, avoid using crcsalt in inputs.conf etc.

Once you have got rid of the cause, get rid of the duplicates using the following search

* | eventstats count as duplicates first(_cd) as cd by _raw host _time | where cd!=_cd

I have deliberately not joined the delete to the above search as it is good practice to check the data before deleting it. Confirm it is only bringing back duplicates and not the original then pipe to delete. You will need to temporarily add the candelete roll to your account for this to work.

...| delete

sideview
SplunkTrust
SplunkTrust

Sounds like there's a problem with a data input, possibly where an input is monitoring a file or directory and it believes for some reason that the entire file has changed. Is there any common pattern to the files or directories where its occurring?

As simeon says you can use the dedup command to mask it but the root cause should be fixable.

0 Karma

Simeon
Splunk Employee
Splunk Employee

The dedup command can be used to create results that do not include duplicates:

http://www.splunk.com/base/Documentation/latest/SearchReference/Dedup

Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...