Getting Data In
Highlighted

Why is data getting duplicated?

Communicator

Hi ,

We have noticed an issue in my Splunk environment:

Issue:

Data is getting duplicated twice in indexers. If i do a search in search head, the same events are coming in twice. this issue started today, earlier there is no issue with the data.

My Investigations:

1) Checked the application logs whether same log is existing twice. Answer: No
2) Checked whether this issue is happening to one sourcetype OR only for one index. Answer: No it is affecting all indexers data.

My questions:

Any other reason why this is happening? And what are the steps needed to prevent it?

Thanks in advance.

Regards,
Puneeth

0 Karma
Highlighted

Re: Why is data getting duplicated?

Contributor

Did you check your inputs.conf if there are 2 stanzas pointing to the same source?

0 Karma
Highlighted

Re: Why is data getting duplicated?

Communicator

No 2 stanzas are not pointing to the same source

0 Karma
Highlighted

Re: Why is data getting duplicated?

SplunkTrust
SplunkTrust

For your security, I removed your phone number from the question.

---
If this reply helps you, an upvote would be appreciated.
0 Karma
Highlighted

Re: Why is data getting duplicated?

Communicator

thanks you very much

0 Karma
Highlighted

Re: Why is data getting duplicated?

Ultra Champion

I'm looking for a good best practices document about duplicate data... found this so far - What are best practices for handling data in a Splunk staging environment that needs to go to produc...

0 Karma
Highlighted

Re: Why is data getting duplicated?

SplunkTrust
SplunkTrust

You have mentioned that all your data is getting duplicated, this sounds like a misconfigured outputs.conf
Can you confirm how your outputs.conf is configured?

Here's an example with 2 indexers which are in an indexer cluster named indexer 1 and 2, indexer acknowledgement is also turned on, SSL is not in use in this example:
[tcpout]
defaultGroup = allIndexers
disabled = false

[tcpout:allIndexers]
server=indexer1:9997,indexer2:9997
autoLB = true
useACK = true

0 Karma
Highlighted

Re: Why is data getting duplicated?

Communicator

Version 6.5.1

DO NOT EDIT THIS FILE!

Changes to default files will be lost on update and are difficult to

manage and support.

Please make any changes to system defaults by overriding them in

apps or $SPLUNK_HOME/etc/system/local

(See "Configuration file precedence" in the web documentation).

To override a specific setting, copy the name of the stanza and

setting to the file where you wish to override it.

[tcpout]
maxQueueSize = auto
forwardedindex.0.whitelist = .*
forwardedindex.1.blacklist = .*
forwardedindex.2.whitelist = (
audit|internal|introspection|_telemetry)
forwardedindex.filter.disable = false
indexAndForward = false
autoLBFrequency = 30
blockOnCloning = true
compressed = false
disabled = false
dropClonedEventsOnQueueFull = 5
dropEventsOnQueueFull = -1
heartbeatFrequency = 30
maxFailuresPerInterval = 2
secsInFailureInterval = 1
maxConnectionsPerIndexer = 2
forceTimebasedAutoLB = false
sendCookedData = true
connectionTimeout = 20
readTimeout = 300
writeTimeout = 300
tcpSendBufSz = 0
ackTimeoutOnShutdown = 30
useACK = false
blockWarnThreshold = 100
sslQuietShutdown = false

[syslog]
type = udp
priority = <13>
dropEventsOnQueueFull = -1
maxEventSize = 1024

0 Karma
Highlighted

Re: Why is data getting duplicated?

SplunkTrust
SplunkTrust

That is the outputs.conf from the default directory.
Perhaps try:
splunk btool outputs list --debug

0 Karma
Highlighted

Re: Why is data getting duplicated?

Splunk Employee
Splunk Employee

In case of duplicate issues, we need to check the following:

  1. Whether the source file contains duplicate events
  2. If mistakenly two inputs.conf are configured in splunk or two forwarders
  3. The original application may send the same data intentionally to two different channels (eg two files)
  4. Behavior where the forwarder is convinced to read a file multiple times, such as an explicit fishbucket reset, or incorrect use of CRCSalt'
  5. Monitoring the directory with symlink loops
  6. Use of the forwarding ACK system, where network failures are correctly intended to result in small amounts of duplicated data
  7. Use of summary indexing to intentionally duplicate events in splunk
  8. The original application may have a bug which produces the log duplication

The following endpoint lists all files known to the tailing processor along with their status (read, ignored, blacklisted, etc...)
Link: https://[splunkdhostname]:[splunkdport]/services/admin/inputstatus/tailingprocessor:filestatus

If you can not able to rectify the issue in the above scenarios, you can enable the DEBUG level using the following components.

  1. TailingProcessor
  2. BatchReader
  3. WatchedFile
  4. FileTracker

To check if the events are duplicated, you can use follwoing SPL,
| eval md=md5(_raw) | stats count by md | where count > 1

For more information, kindly check, community: Troubleshooting Monitor Inputs
Link: https://wiki.splunk.com/Community:Troubleshooting_Monitor_Inputs