We're seeing an issue on our indexer cluster where ~25% of events are duplicated. The raw logs do not contain duplicates, nor are there duplicate or overlapping monitor stanzas. When looking at bucket ID, Index Time, and Splunk Server, all are identical across the duplicates.
Our indexers are clustered, and we're running Enterprise version 6.6.3 on Windows Server 2012 R2.
Here's our aggregated outputs.conf from a Universal Forwarder:
\splunk btool outputs list [syslog] maxEventSize = 1024 priority = <13> type = udp [tcpout] ackTimeoutOnShutdown = 30 autoLBFrequency = 30 autoLBVolume = 0 blockOnCloning = true blockWarnThreshold = 100 cipherSuite = ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:AES256-GCM-SHA384:AES128-GCM-SHA256:AES128-SHA256:ECDH-ECDSA-AES256-GCM-SHA384:ECDH-ECDSA-AES128-GCM-SHA256:ECDH-ECDSA-AES256-SHA384:ECDH-ECDSA-AES128-SHA256 compressed = false connectionTimeout = 20 defaultGroup = gd_indexers disabled = false dropClonedEventsOnQueueFull = 5 dropEventsOnQueueFull = -1 ecdhCurves = prime256v1, secp384r1, secp521r1 forceTimebasedAutoLB = false forwardedindex.0.whitelist = .* forwardedindex.1.blacklist = _.* forwardedindex.2.whitelist = (_audit|_introspection|_internal|_telemetry) forwardedindex.filter.disable = false heartbeatFrequency = 30 indexAndForward = false maxConnectionsPerIndexer = 2 maxFailuresPerInterval = 2 maxQueueSize = auto readTimeout = 300 secsInFailureInterval = 1 sendCookedData = true sslQuietShutdown = false sslVersions = tls1.2 tcpSendBufSz = 0 useACK = true writeTimeout = 300 [tcpout:gd_indexers] server = <List of Internal IPs>
If anyone can suggest an avenue for troubleshooting, it would be greatly appreciated. Please also let me know if I can provide more relevant information.
You said you looked at indextime. Did that include looking at the indextime for both copies of the same event?
Pick a few events that are duplicated and look at any differences between the events.
indextime, host, splunkserver... is there anything you can see as different?
I used the following search:
index=<my_index> sourcetype=<my_sourcetype> | eval bucket=_bkt | eval indextime=_indextime | table _time, indextime, bucket splunk_server _raw | convert ctime(indextime) | stats count list(*) as * by _raw | where count>1 | fields * _raw
Under the indextime field, I saw one value repeated for each of the duplicate events, same with bucket and splunk_server.
There appears to be no difference between duplicates, aside from occasionally there are 3 to 5 copies in an indexer, but most of the time just two copies. It's not always the same indexer either, it seems relatively evenly distributed.
We're working on upgrading to 7.2.x as soon as we can get it scheduled. The linked question looks like it's talking about 6.4 as the solution; we're on 6.6. Appreciate you taking the time to post a suggestion though!