Getting Data In

Why is data getting duplicated?


Hi ,

We have noticed an issue in my Splunk environment:


Data is getting duplicated twice in indexers. If i do a search in search head, the same events are coming in twice. this issue started today, earlier there is no issue with the data.

My Investigations:

1) Checked the application logs whether same log is existing twice. Answer: No
2) Checked whether this issue is happening to one sourcetype OR only for one index. Answer: No it is affecting all indexers data.

My questions:

Any other reason why this is happening? And what are the steps needed to prevent it?

Thanks in advance.


0 Karma

Splunk Employee
Splunk Employee

In case of duplicate issues, we need to check the following:

  1. Whether the source file contains duplicate events
  2. If mistakenly two inputs.conf are configured in splunk or two forwarders
  3. The original application may send the same data intentionally to two different channels (eg two files)
  4. Behavior where the forwarder is convinced to read a file multiple times, such as an explicit fishbucket reset, or incorrect use of CRCSalt'
  5. Monitoring the directory with symlink loops
  6. Use of the forwarding ACK system, where network failures are correctly intended to result in small amounts of duplicated data
  7. Use of summary indexing to intentionally duplicate events in splunk
  8. The original application may have a bug which produces the log duplication

The following endpoint lists all files known to the tailing processor along with their status (read, ignored, blacklisted, etc...)
Link: https://[splunkd_hostname]:[splunkd_port]/services/admin/inputstatus/tailingprocessor:filestatus

If you can not able to rectify the issue in the above scenarios, you can enable the DEBUG level using the following components.

  1. TailingProcessor
  2. BatchReader
  3. WatchedFile
  4. FileTracker

To check if the events are duplicated, you can use follwoing SPL,
| eval md=md5(_raw) | stats count by md | where count > 1

For more information, kindly check, community: Troubleshooting Monitor Inputs


You have mentioned that all your data is getting duplicated, this sounds like a misconfigured outputs.conf
Can you confirm how your outputs.conf is configured?

Here's an example with 2 indexers which are in an indexer cluster named indexer 1 and 2, indexer acknowledgement is also turned on, SSL is not in use in this example:
defaultGroup = allIndexers
disabled = false

autoLB = true
useACK = true

Alerts for Splunk Admins
Version Control for Splunk
0 Karma


#Version 6.5.1
#DO NOT EDIT THIS FILE! #Changes to default files will be lost on update and are difficult to
#manage and support.
#Please make any changes to system defaults by overriding them in
#apps or $SPLUNK_HOME/etc/system/local
#(See "Configuration file precedence" in the web documentation).
#To override a specific setting, copy the name of the stanza and #setting to the file where you wish to override it.

maxQueueSize = auto
forwardedindex.0.whitelist = .*
forwardedindex.1.blacklist = _.*
forwardedindex.2.whitelist = (_audit|_internal|_introspection|_telemetry)
forwardedindex.filter.disable = false
indexAndForward = false
autoLBFrequency = 30
blockOnCloning = true
compressed = false
disabled = false
dropClonedEventsOnQueueFull = 5
dropEventsOnQueueFull = -1
heartbeatFrequency = 30
maxFailuresPerInterval = 2
secsInFailureInterval = 1
maxConnectionsPerIndexer = 2
forceTimebasedAutoLB = false
sendCookedData = true
connectionTimeout = 20
readTimeout = 300
writeTimeout = 300
tcpSendBufSz = 0
ackTimeoutOnShutdown = 30
useACK = false
blockWarnThreshold = 100
sslQuietShutdown = false

type = udp
priority = <13>
dropEventsOnQueueFull = -1
maxEventSize = 1024
0 Karma


That is the outputs.conf from the default directory.
Perhaps try:
splunk btool outputs list --debug

Alerts for Splunk Admins
Version Control for Splunk
0 Karma

Ultra Champion

I'm looking for a good best practices document about duplicate data... found this so far - What are best practices for handling data in a Splunk staging environment that needs to go to produc...

0 Karma


For your security, I removed your phone number from the question.

If this reply helps you, an upvote would be appreciated.
0 Karma


thanks you very much

0 Karma


Did you check your inputs.conf if there are 2 stanzas pointing to the same source?

0 Karma


No 2 stanzas are not pointing to the same source

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.