We're seeing an issue on our indexer cluster where ~25% of events are duplicated. The raw logs do not contain duplicates, nor are there duplicate or overlapping monitor stanzas. When looking at bucket ID, Index Time, and Splunk Server, all are identical across the duplicates.
Our indexers are clustered, and we're running Enterprise version 6.6.3 on Windows Server 2012 R2.
Here's our aggregated outputs.conf from a Universal Forwarder:
\splunk btool outputs list
[syslog]
maxEventSize = 1024
priority = <13>
type = udp
[tcpout]
ackTimeoutOnShutdown = 30
autoLBFrequency = 30
autoLBVolume = 0
blockOnCloning = true
blockWarnThreshold = 100
cipherSuite = ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:AES256-GCM-SHA384:AES128-GCM-SHA256:AES128-SHA256:ECDH-ECDSA-AES256-GCM-SHA384:ECDH-ECDSA-AES128-GCM-SHA256:ECDH-ECDSA-AES256-SHA384:ECDH-ECDSA-AES128-SHA256
compressed = false
connectionTimeout = 20
defaultGroup = gd_indexers
disabled = false
dropClonedEventsOnQueueFull = 5
dropEventsOnQueueFull = -1
ecdhCurves = prime256v1, secp384r1, secp521r1
forceTimebasedAutoLB = false
forwardedindex.0.whitelist = .*
forwardedindex.1.blacklist = _.*
forwardedindex.2.whitelist = (_audit|_introspection|_internal|_telemetry)
forwardedindex.filter.disable = false
heartbeatFrequency = 30
indexAndForward = false
maxConnectionsPerIndexer = 2
maxFailuresPerInterval = 2
maxQueueSize = auto
readTimeout = 300
secsInFailureInterval = 1
sendCookedData = true
sslQuietShutdown = false
sslVersions = tls1.2
tcpSendBufSz = 0
useACK = true
writeTimeout = 300
[tcpout:gd_indexers]
server = <List of Internal IPs>
If anyone can suggest an avenue for troubleshooting, it would be greatly appreciated. Please also let me know if I can provide more relevant information.
You said you looked at indextime. Did that include looking at the indextime for both copies of the same event?
Pick a few events that are duplicated and look at any differences between the events.
indextime, host, splunkserver... is there anything you can see as different?
...Laurie:{)
any difference between the original and it duplicate/s.
i.e. for each event, how does it differ from its duplicate? Is there only 1 copy or more of each of the duplicates?
I used the following search:
index=<my_index> sourcetype=<my_sourcetype>
| eval bucket=_bkt
| eval indextime=_indextime
| table _time, indextime, bucket splunk_server _raw
| convert ctime(indextime)
| stats count list(*) as * by _raw
| where count>1
| fields * _raw
Under the indextime field, I saw one value repeated for each of the duplicate events, same with bucket and splunk_server.
There appears to be no difference between duplicates, aside from occasionally there are 3 to 5 copies in an indexer, but most of the time just two copies. It's not always the same indexer either, it seems relatively evenly distributed.
Found this: https://answers.splunk.com/answers/365914/why-are-we-seeing-duplicate-events-found-in-an-ind.html
An additional incentive to update???
We're working on upgrading to 7.2.x as soon as we can get it scheduled. The linked question looks like it's talking about 6.4 as the solution; we're on 6.6. Appreciate you taking the time to post a suggestion though!