Getting Data In

Why is data getting duplicated?

puneethgowda
Communicator

Hi ,

We have noticed an issue in my Splunk environment:

Issue:

Data is getting duplicated twice in indexers. If i do a search in search head, the same events are coming in twice. this issue started today, earlier there is no issue with the data.

My Investigations:

1) Checked the application logs whether same log is existing twice. Answer: No
2) Checked whether this issue is happening to one sourcetype OR only for one index. Answer: No it is affecting all indexers data.

My questions:

Any other reason why this is happening? And what are the steps needed to prevent it?

Thanks in advance.

Regards,
Puneeth

0 Karma

dkolekar_splunk
Splunk Employee
Splunk Employee

In case of duplicate issues, we need to check the following:

  1. Whether the source file contains duplicate events
  2. If mistakenly two inputs.conf are configured in splunk or two forwarders
  3. The original application may send the same data intentionally to two different channels (eg two files)
  4. Behavior where the forwarder is convinced to read a file multiple times, such as an explicit fishbucket reset, or incorrect use of CRCSalt'
  5. Monitoring the directory with symlink loops
  6. Use of the forwarding ACK system, where network failures are correctly intended to result in small amounts of duplicated data
  7. Use of summary indexing to intentionally duplicate events in splunk
  8. The original application may have a bug which produces the log duplication

The following endpoint lists all files known to the tailing processor along with their status (read, ignored, blacklisted, etc...)
Link: https://[splunkd_hostname]:[splunkd_port]/services/admin/inputstatus/tailingprocessor:filestatus

If you can not able to rectify the issue in the above scenarios, you can enable the DEBUG level using the following components.

  1. TailingProcessor
  2. BatchReader
  3. WatchedFile
  4. FileTracker

To check if the events are duplicated, you can use follwoing SPL,
| eval md=md5(_raw) | stats count by md | where count > 1

For more information, kindly check, community: Troubleshooting Monitor Inputs
Link: https://wiki.splunk.com/Community:Troubleshooting_Monitor_Inputs

gjanders
SplunkTrust
SplunkTrust

You have mentioned that all your data is getting duplicated, this sounds like a misconfigured outputs.conf
Can you confirm how your outputs.conf is configured?

Here's an example with 2 indexers which are in an indexer cluster named indexer 1 and 2, indexer acknowledgement is also turned on, SSL is not in use in this example:
[tcpout]
defaultGroup = allIndexers
disabled = false

[tcpout:allIndexers]
server=indexer1:9997,indexer2:9997
autoLB = true
useACK = true

0 Karma

puneethgowda
Communicator

#Version 6.5.1
#DO NOT EDIT THIS FILE! #Changes to default files will be lost on update and are difficult to
#manage and support.
#Please make any changes to system defaults by overriding them in
#apps or $SPLUNK_HOME/etc/system/local
#(See "Configuration file precedence" in the web documentation).
#To override a specific setting, copy the name of the stanza and #setting to the file where you wish to override it.

[tcpout]
maxQueueSize = auto
forwardedindex.0.whitelist = .*
forwardedindex.1.blacklist = _.*
forwardedindex.2.whitelist = (_audit|_internal|_introspection|_telemetry)
forwardedindex.filter.disable = false
indexAndForward = false
autoLBFrequency = 30
blockOnCloning = true
compressed = false
disabled = false
dropClonedEventsOnQueueFull = 5
dropEventsOnQueueFull = -1
heartbeatFrequency = 30
maxFailuresPerInterval = 2
secsInFailureInterval = 1
maxConnectionsPerIndexer = 2
forceTimebasedAutoLB = false
sendCookedData = true
connectionTimeout = 20
readTimeout = 300
writeTimeout = 300
tcpSendBufSz = 0
ackTimeoutOnShutdown = 30
useACK = false
blockWarnThreshold = 100
sslQuietShutdown = false

[syslog]
type = udp
priority = <13>
dropEventsOnQueueFull = -1
maxEventSize = 1024
0 Karma

gjanders
SplunkTrust
SplunkTrust

That is the outputs.conf from the default directory.
Perhaps try:
splunk btool outputs list --debug

0 Karma

ddrillic
Ultra Champion

I'm looking for a good best practices document about duplicate data... found this so far - What are best practices for handling data in a Splunk staging environment that needs to go to produc...

0 Karma

richgalloway
SplunkTrust
SplunkTrust

For your security, I removed your phone number from the question.

---
If this reply helps you, Karma would be appreciated.
0 Karma

puneethgowda
Communicator

thanks you very much

0 Karma

PPape
Contributor

Did you check your inputs.conf if there are 2 stanzas pointing to the same source?

0 Karma

puneethgowda
Communicator

No 2 stanzas are not pointing to the same source

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...