Deployment Architecture

When initiating a rolling restart for an indexer cluster, why do we see a lot of bucket fix up tasks?

arns
New Member

Hello,

I'd like to know why when we initiate a rolling restart of the indexer cluster, we see a lot of bucket fix up tasks (for search and replication factor). It causes a lot of CPU and I/O pressure on the cluster.

I understand that fixup is required when a node goes down/up (leaves or join the cluster), but I don't understand why this should append for a simple splunkd restart managed by the master.

Regards,

0 Karma

ddrillic
Ultra Champion

http://docs.splunk.com/Documentation/Splunk/6.1/Indexer/Restartthecluster says -
"When you restart a master or peer node, the master rebalances the primary bucket copies across the set of peers,..."

For some reason, it doesn't describe the toll on the cluster when doing the rolling restart.

0 Karma

muebel
SplunkTrust
SplunkTrust

The rolling restart essentially runs a "splunk offline" on each indexer one-by-one. An "offline" of an indexer cluster slave is a controlled shutdown, where all buckets that it is assigned as primary for are transitioned to another slave, and buckets are replicated as needed to maintain the rep factor, or made searchable as needed to maintain the search factor. This is most of the fixup tasks.

With that being said, and in particular for large clusters, a rolling restart can be quite traumatic, especially if you are glued to the Cluster Master console. The cluster will eventually recover most of the time, and so its probably best to kick of the restart, make sure it actually took, and then give it 5 minutes before checking in.

Please let me know if this answers your question! 😄

0 Karma

arns
New Member

As per the documentation:

Warning: While the cluster is in maintenance mode, the master will not enforce replication factor or search factor policies. The only bucket fix-up that occurs during maintenance mode is that the master will attempt, when necessary, to reassign primaries to available searchable bucket copies. So, the cluster can be operating under a valid but incomplete status. See Indexer cluster states to understand the implications of this.

Note: The CLI commands apply cluster-bundle and rolling-restart incorporate maintenance mode functionality into their behavior by default, so there's no reason to invoke it explicitly when running those commands. A message stating that maintenance mode is on will appear on the master dashboard when you invoke these actions.

So, as i understand, there should be no fixup tasks (except searchable to primary) during a cluster rolling restart. But there is...a lot ....

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...