Re: What causes duplicate data in an indexer clust...

schofiet · ‎11-15-2018

We're seeing an issue on our indexer cluster where ~25% of events are duplicated. The raw logs do not contain duplicates, nor are there duplicate or overlapping monitor stanzas. When looking at bucket ID, Index Time, and Splunk Server, all are identical across the duplicates.

Our indexers are clustered, and we're running Enterprise version 6.6.3 on Windows Server 2012 R2.

Here's our aggregated outputs.conf from a Universal Forwarder:

\splunk btool outputs list

[syslog]
maxEventSize = 1024
priority = <13>
type = udp
[tcpout]
ackTimeoutOnShutdown = 30
autoLBFrequency = 30
autoLBVolume = 0
blockOnCloning = true
blockWarnThreshold = 100
cipherSuite = ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:AES256-GCM-SHA384:AES128-GCM-SHA256:AES128-SHA256:ECDH-ECDSA-AES256-GCM-SHA384:ECDH-ECDSA-AES128-GCM-SHA256:ECDH-ECDSA-AES256-SHA384:ECDH-ECDSA-AES128-SHA256
compressed = false
connectionTimeout = 20
defaultGroup = gd_indexers
disabled = false
dropClonedEventsOnQueueFull = 5
dropEventsOnQueueFull = -1
ecdhCurves = prime256v1, secp384r1, secp521r1
forceTimebasedAutoLB = false
forwardedindex.0.whitelist = .*
forwardedindex.1.blacklist = _.*
forwardedindex.2.whitelist = (_audit|_introspection|_internal|_telemetry)
forwardedindex.filter.disable = false
heartbeatFrequency = 30
indexAndForward = false
maxConnectionsPerIndexer = 2
maxFailuresPerInterval = 2
maxQueueSize = auto
readTimeout = 300
secsInFailureInterval = 1
sendCookedData = true
sslQuietShutdown = false
sslVersions = tls1.2
tcpSendBufSz = 0
useACK = true
writeTimeout = 300
[tcpout:gd_indexers]
server = <List of Internal IPs>

If anyone can suggest an avenue for troubleshooting, it would be greatly appreciated. Please also let me know if I can provide more relevant information.

laurie_gellatly · ‎11-18-2018

You said you looked at indextime. Did that include looking at the indextime for both copies of the same event?
Pick a few events that are duplicated and look at any differences between the events.
indextime, host, splunkserver... is there anything you can see as different?

...Laurie:{)

laurie_gellatly · ‎11-18-2018

any difference between the original and it duplicate/s.
i.e. for each event, how does it differ from its duplicate? Is there only 1 copy or more of each of the duplicates?

schofiet · ‎11-19-2018

I used the following search:

index=<my_index> sourcetype=<my_sourcetype>
| eval bucket=_bkt
| eval indextime=_indextime
| table _time, indextime, bucket splunk_server _raw
| convert ctime(indextime)
| stats count list(*) as * by _raw
| where count>1
| fields * _raw

Under the indextime field, I saw one value repeated for each of the duplicate events, same with bucket and splunk_server.

There appears to be no difference between duplicates, aside from occasionally there are 3 to 5 copies in an indexer, but most of the time just two copies. It's not always the same indexer either, it seems relatively evenly distributed.

laurie_gellatly · ‎11-19-2018

Found this: https://answers.splunk.com/answers/365914/why-are-we-seeing-duplicate-events-found-in-an-ind.html
An additional incentive to update???

schofiet · ‎11-19-2018

We're working on upgrading to 7.2.x as soon as we can get it scheduled. The linked question looks like it's talking about 6.4 as the solution; we're on 6.6. Appreciate you taking the time to post a suggestion though!

What causes duplicate data in an indexer cluster?

Upcoming Webinar: Unmasking Insider Threats with Slunk Enterprise Security’s UEBA

.conf25 technical session recap of Observability for Gen AI: Monitoring LLM ...

A Season of Skills: New Splunk Courses to Light Up Your Learning Journey

Join the Conversation

What causes duplicate data in an indexer cluster?

Upcoming Webinar: Unmasking Insider Threats with Slunk Enterprise Security’s UEBA

.conf25 technical session recap of Observability for Gen AI: Monitoring LLM ...

A Season of Skills: New Splunk Courses to Light Up Your Learning Journey