Splunk Search

Can anyone help clarify why splunk sometimes indexes duplicate events from one log file?

Path Finder

I've been all over related questions in Splunk base, but I have not found out why exactly Splunk will sometime index duplicate events. A simple dedup will help mitigate this issue but does not get to the core of the problem.

My Scenario:
I'm indexing mutiple logs from a global file system so my input.conf would look like this.

[monitor://global/file/system/apache/log/nodes*/access_log]
index = log_index

The duplicate number of events is not consistent. The number is usually between 2 an 12.
Should I add crcSalt option?
The Other option im using is setting the maxKBps = 56 on the forwarder, will this have any impact on the main indexer?

0 Karma

Path Finder
0 Karma

Communicator

Hi,

Check "splunkd.log" for the following pattern to check if forwarder resends a data block.

WARN TcpOutputProc - Possible duplication of events with channel=

Regards,
Amit Saxena

0 Karma

Super Champion

Links between monitored files or directories will cause duplicates. Remove the links or blacklist duplicates.

0 Karma

Path Finder

Hi Amit,

Thanks for the response. I took a look at both (forwarder and indexer) splunkd.log files and I did not see any WARN lines concerning possible duplicate events. I'm thinking it might be the way our global file system is set up since our logs reside on a global mount using symlinks.

0 Karma

Super Champion

I've seen duplicate logs caused by the following:
Is the log being rotated? If so, then monitor only the current log.
Is there a link that duplicates the contents to another monitored directory? If so, then remove the link, or blacklist one of them.

0 Karma