Hi ,
We have noticed an issue in my Splunk environment:
Issue:
Data is getting duplicated twice in indexers. If i do a search in search head, the same events are coming in twice. this issue started today, earlier there is no issue with the data.
My Investigations:
1) Checked the application logs whether same log is existing twice. Answer: No
2) Checked whether this issue is happening to one sourcetype OR only for one index. Answer: No it is affecting all indexers data.
My questions:
Any other reason why this is happening? And what are the steps needed to prevent it?
Thanks in advance.
Regards,
Puneeth
In case of duplicate issues, we need to check the following:
The following endpoint lists all files known to the tailing processor along with their status (read, ignored, blacklisted, etc...)
Link: https://[splunkd_hostname]:[splunkd_port]/services/admin/inputstatus/tailingprocessor:filestatus
If you can not able to rectify the issue in the above scenarios, you can enable the DEBUG level using the following components.
To check if the events are duplicated, you can use follwoing SPL,
| eval md=md5(_raw) | stats count by md | where count > 1
For more information, kindly check, community: Troubleshooting Monitor Inputs
Link: https://wiki.splunk.com/Community:Troubleshooting_Monitor_Inputs
You have mentioned that all your data is getting duplicated, this sounds like a misconfigured outputs.conf
Can you confirm how your outputs.conf is configured?
Here's an example with 2 indexers which are in an indexer cluster named indexer 1 and 2, indexer acknowledgement is also turned on, SSL is not in use in this example:
[tcpout]
defaultGroup = allIndexers
disabled = false
[tcpout:allIndexers]
server=indexer1:9997,indexer2:9997
autoLB = true
useACK = true
#Version 6.5.1
#DO NOT EDIT THIS FILE! #Changes to default files will be lost on update and are difficult to
#manage and support.
#Please make any changes to system defaults by overriding them in
#apps or $SPLUNK_HOME/etc/system/local
#(See "Configuration file precedence" in the web documentation).
#To override a specific setting, copy the name of the stanza and #setting to the file where you wish to override it.
[tcpout]
maxQueueSize = auto
forwardedindex.0.whitelist = .*
forwardedindex.1.blacklist = _.*
forwardedindex.2.whitelist = (_audit|_internal|_introspection|_telemetry)
forwardedindex.filter.disable = false
indexAndForward = false
autoLBFrequency = 30
blockOnCloning = true
compressed = false
disabled = false
dropClonedEventsOnQueueFull = 5
dropEventsOnQueueFull = -1
heartbeatFrequency = 30
maxFailuresPerInterval = 2
secsInFailureInterval = 1
maxConnectionsPerIndexer = 2
forceTimebasedAutoLB = false
sendCookedData = true
connectionTimeout = 20
readTimeout = 300
writeTimeout = 300
tcpSendBufSz = 0
ackTimeoutOnShutdown = 30
useACK = false
blockWarnThreshold = 100
sslQuietShutdown = false
[syslog]
type = udp
priority = <13>
dropEventsOnQueueFull = -1
maxEventSize = 1024
That is the outputs.conf from the default directory.
Perhaps try:
splunk btool outputs list --debug
I'm looking for a good best practices document about duplicate data... found this so far - What are best practices for handling data in a Splunk staging environment that needs to go to produc...
For your security, I removed your phone number from the question.
thanks you very much
Did you check your inputs.conf if there are 2 stanzas pointing to the same source?
No 2 stanzas are not pointing to the same source