I've been all over related questions in Splunk base, but I have not found out why exactly Splunk will sometime index duplicate events. A simple dedup will help mitigate this issue but does not get to the core of the problem.
My Scenario:
I'm indexing mutiple logs from a global file system so my input.conf would look like this.
[monitor://global/file/system/apache/log/nodes*/access_log]
index = log_index
The duplicate number of events is not consistent. The number is usually between 2 an 12.
Should I add crcSalt option?
The Other option im using is setting the maxKBps = 56 on the forwarder, will this have any impact on the main indexer?
This might answer your question:
http://answers.splunk.com/answers/117947/splunk-adds-filepart-to-file-name.html
Hi,
Check "splunkd.log" for the following pattern to check if forwarder resends a data block.
WARN TcpOutputProc - Possible duplication of events with channel=
Regards,
Amit Saxena
Links between monitored files or directories will cause duplicates. Remove the links or blacklist duplicates.
Hi Amit,
Thanks for the response. I took a look at both (forwarder and indexer) splunkd.log files and I did not see any WARN lines concerning possible duplicate events. I'm thinking it might be the way our global file system is set up since our logs reside on a global mount using symlinks.
I've seen duplicate logs caused by the following:
Is the log being rotated? If so, then monitor only the current log.
Is there a link that duplicates the contents to another monitored directory? If so, then remove the link, or blacklist one of them.