In case of duplicate issues, we need to check the following:
Whether the source file contains duplicate events
If mistakenly two inputs.conf are configured in splunk or two forwarders
The original application may send the same data intentionally to two different channels (eg two files)
Behavior where the forwarder is convinced to read a file multiple times, such as an explicit fishbucket reset, or incorrect use of CRCSalt'
Monitoring the directory with symlink loops
Use of the forwarding ACK system, where network failures are correctly intended to result in small amounts of duplicated data
Use of summary indexing to intentionally duplicate events in splunk
The original application may have a bug which produces the log duplication
The following endpoint lists all files known to the tailing processor along with their status (read, ignored, blacklisted, etc...)
Link: https://[splunkd_hostname]:[splunkd_port]/services/admin/inputstatus/tailingprocessor:filestatus
If you can not able to rectify the issue in the above scenarios, you can enable the DEBUG level using the following components.
TailingProcessor
BatchReader
WatchedFile
FileTracker
To check if the events are duplicated, you can use follwoing SPL,
| eval md=md5(_raw) | stats count by md | where count > 1
For more information, kindly check, community: Troubleshooting Monitor Inputs
Link: https://wiki.splunk.com/Community:Troubleshooting_Monitor_Inputs
... View more