Deployment Architecture

What causes duplicate data in an indexer cluster?

schofiet
Explorer

We're seeing an issue on our indexer cluster where ~25% of events are duplicated. The raw logs do not contain duplicates, nor are there duplicate or overlapping monitor stanzas. When looking at bucket ID, Index Time, and Splunk Server, all are identical across the duplicates.

Our indexers are clustered, and we're running Enterprise version 6.6.3 on Windows Server 2012 R2.

Here's our aggregated outputs.conf from a Universal Forwarder:

\splunk btool outputs list

[syslog]
maxEventSize = 1024
priority = <13>
type = udp
[tcpout]
ackTimeoutOnShutdown = 30
autoLBFrequency = 30
autoLBVolume = 0
blockOnCloning = true
blockWarnThreshold = 100
cipherSuite = ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:AES256-GCM-SHA384:AES128-GCM-SHA256:AES128-SHA256:ECDH-ECDSA-AES256-GCM-SHA384:ECDH-ECDSA-AES128-GCM-SHA256:ECDH-ECDSA-AES256-SHA384:ECDH-ECDSA-AES128-SHA256
compressed = false
connectionTimeout = 20
defaultGroup = gd_indexers
disabled = false
dropClonedEventsOnQueueFull = 5
dropEventsOnQueueFull = -1
ecdhCurves = prime256v1, secp384r1, secp521r1
forceTimebasedAutoLB = false
forwardedindex.0.whitelist = .*
forwardedindex.1.blacklist = _.*
forwardedindex.2.whitelist = (_audit|_introspection|_internal|_telemetry)
forwardedindex.filter.disable = false
heartbeatFrequency = 30
indexAndForward = false
maxConnectionsPerIndexer = 2
maxFailuresPerInterval = 2
maxQueueSize = auto
readTimeout = 300
secsInFailureInterval = 1
sendCookedData = true
sslQuietShutdown = false
sslVersions = tls1.2
tcpSendBufSz = 0
useACK = true
writeTimeout = 300
[tcpout:gd_indexers]
server = <List of Internal IPs>

If anyone can suggest an avenue for troubleshooting, it would be greatly appreciated. Please also let me know if I can provide more relevant information.

0 Karma

laurie_gellatly
Communicator

You said you looked at indextime. Did that include looking at the indextime for both copies of the same event?
Pick a few events that are duplicated and look at any differences between the events.
indextime, host, splunkserver... is there anything you can see as different?

...Laurie:{)

0 Karma

laurie_gellatly
Communicator

any difference between the original and it duplicate/s.
i.e. for each event, how does it differ from its duplicate? Is there only 1 copy or more of each of the duplicates?

0 Karma

schofiet
Explorer

I used the following search:

index=<my_index> sourcetype=<my_sourcetype>
| eval bucket=_bkt
| eval indextime=_indextime
| table _time, indextime, bucket splunk_server _raw
| convert ctime(indextime)
| stats count list(*) as * by _raw
| where count>1
| fields * _raw

Under the indextime field, I saw one value repeated for each of the duplicate events, same with bucket and splunk_server.

There appears to be no difference between duplicates, aside from occasionally there are 3 to 5 copies in an indexer, but most of the time just two copies. It's not always the same indexer either, it seems relatively evenly distributed.

0 Karma

laurie_gellatly
Communicator
0 Karma

schofiet
Explorer

We're working on upgrading to 7.2.x as soon as we can get it scheduled. The linked question looks like it's talking about 6.4 as the solution; we're on 6.6. Appreciate you taking the time to post a suggestion though!

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...