- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We're using 'batch' stanza on our Splunk forwarders so they delete the log files once they've been indexed. Obviously, this means we've lost Splunk's capability of detecting duplicates. We're trying to set something up manually so we can detect if Splunk has already indexed the file before submitting that file to indexers. For a duplicate, file name and file data is exactly the same. Do you have any suggestions/pointers on how to set this up? Thanks.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found a solution through this
http://splunk-base.splunk.com/answers/432/how-do-i-find-all-duplicate-events
The search that works is
sourcetype=* | eval dupfield=_raw | transaction dupfield maxspan=1s keepevicted=true | where mvcount(sourcetype) > 1
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

If you are using batch
and it is working correctly, it should be impossible for any files's data to be indexed more than once UNLESS the same file appears on more than 1 forwarder OR the same data appears in more than 1 file. Splunk has built-in protections to disallow the latter case (see crcSalt
) so the most likely situation is that you have the same file being sent to more than 1 forwarder. For your duplicates, do the have the same host
as well (or is host overridden)? Share some more detail about why you think you have duplicate events (I am highly skeptical).
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found a solution through this
http://splunk-base.splunk.com/answers/432/how-do-i-find-all-duplicate-events
The search that works is
sourcetype=* | eval dupfield=_raw | transaction dupfield maxspan=1s keepevicted=true | where mvcount(sourcetype) > 1
