Getting Data In

Bucket replication queue full causing random indexer slowdown.

sbhale
Explorer

Had a weird issue where my queues would fill up on random nodes and rove around within the cluster.
Had a case opened with support and Was working through and making all sorts of adjustments and ruling out all sorts of issues to no vail.
Finally had a breakthrough when I noticed that we were seeing
INFO BucketReplicator - replication queue for peer=<guid> bid=<bid> is full.
Followed almost immediately by
INFO BucketReplicator - replication queue for peer=<guid> bid=<bid> has room now.
over and over again. The gap between those two messages was only a few milliseconds.

No other obvious ERROR pointing to the cause.

Tags (1)
1 Solution

sbhale
Explorer

Answering my own question so others will find it useful.

The presense of the above messages with the same peer guid was ruled to be the problem.
One of our peer nodes was acting up and slowing down any nodes replicating to it just a little bit but enough that it was a propagating and causing queues to get backed up.
The solution was putting the node in manual detention to be either re-built or retired.

View solution in original post

sbhale
Explorer

Answering my own question so others will find it useful.

The presense of the above messages with the same peer guid was ruled to be the problem.
One of our peer nodes was acting up and slowing down any nodes replicating to it just a little bit but enough that it was a propagating and causing queues to get backed up.
The solution was putting the node in manual detention to be either re-built or retired.

Get Updates on the Splunk Community!

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

(view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...