Splunk Search

Since we're using the batch stanza on our forwarders, how can we manually find duplicate indexed logs?

shahzadarif
Path Finder

We're using 'batch' stanza on our Splunk forwarders so they delete the log files once they've been indexed. Obviously, this means we've lost Splunk's capability of detecting duplicates. We're trying to set something up manually so we can detect if Splunk has already indexed the file before submitting that file to indexers. For a duplicate, file name and file data is exactly the same. Do you have any suggestions/pointers on how to set this up? Thanks.

0 Karma
1 Solution

lavanyaanne
Path Finder

I found a solution through this

http://splunk-base.splunk.com/answers/432/how-do-i-find-all-duplicate-events

The search that works is

sourcetype=* | eval dupfield=_raw | transaction dupfield maxspan=1s keepevicted=true | where mvcount(sourcetype) > 1

View solution in original post

woodcock
Esteemed Legend

If you are using batch and it is working correctly, it should be impossible for any files's data to be indexed more than once UNLESS the same file appears on more than 1 forwarder OR the same data appears in more than 1 file. Splunk has built-in protections to disallow the latter case (see crcSalt) so the most likely situation is that you have the same file being sent to more than 1 forwarder. For your duplicates, do the have the same host as well (or is host overridden)? Share some more detail about why you think you have duplicate events (I am highly skeptical).

0 Karma

lavanyaanne
Path Finder

I found a solution through this

http://splunk-base.splunk.com/answers/432/how-do-i-find-all-duplicate-events

The search that works is

sourcetype=* | eval dupfield=_raw | transaction dupfield maxspan=1s keepevicted=true | where mvcount(sourcetype) > 1

Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

In the age of AI, every tool promises to make our lives easier. From summarizing content to writing code, ...

Data Persistence in the OpenTelemetry Collector

This blog post is part of an ongoing series on OpenTelemetry. What happens if the OpenTelemetry collector ...

Thanks for the Memories! Splunk University, .conf25, and our Community

Thank you to everyone in the Splunk Community who joined us for .conf25, which kicked off with our iconic ...