Deployment Architecture

Why am I getting "bucket not serviceable" errors on an indexer cluster master and replication is failing for some buckets?

austinament
Engager

Receiving "bucket not serviceable" errors on cluster master and replication is failing for some buckets. Whats the best way to resolve this?

rbal_splunk
Splunk Employee
Splunk Employee

If the cluster peers are crashing often in clustered deployment you may end up having buckets in this state.

0 Karma

sowings
Splunk Employee
Splunk Employee

A bucket is considered "serviceable" when there is at least one backup copy aside from the main copy. Even if this bucket is "merely" a replica copy, a fully searchable version of this can be built. It's the bare minimum of data security for the cluster.

What might lead to a bucket not being serviceable?

  • When the data first arrives, the indexer creates a hot bucket to store the data. It notifies the cluster master that it has new data.
  • The cluster master replies with a list of peers to which to send duplicate "streams". If it can't talk to the cluster master (down, or some weird network issue), then it will continue with the list of replica peers it had before. If all of the peers are also unavailable (down, or unreachable over the network), then this host will be stuck holding the only copy of the data. This bucket is not yet serviceable.

How then do we recover from this?

  • Hot buckets will naturally roll to warm buckets. A few possible dials in indexes.conf govern the timing of this transition. It can be based on time, size, or several other possible reasons.
  • When the bucket becomes warm, the bucket becomes a non-streaming source. The cluster master can now prompt the primary host to replicate this bucket elsewhere to create a backup copy. When this happens, the bucket is now "serviceable".
  • In some cases, however, the hot-to-warm transition alone is not enough to prompt "fix up" activity for the bucket. This can be fixed by restarting the indexer: Upon joining the cluster, the indexer provides a list of all of the data buckets present on its filesystem, and the cluster master will merge this list with the lists from the other indexers. If at this point the CM recognizes that there aren't enough copies of the bucket to meet policy (replication_factor and search_factor), it will trigger "fix up" activity by making extra copies or prompting replica copies to become searchable.

What's the long term strategy?

Ensure good network connectivity between your cluster master and indexers. I myself have entered a couple of mistaken firewall rules, halting communication between the cluster peers, and thereby generated a few "non-serviceable" buckets. It's a recoverable situation. Don't panic. As strange as it is to say, sometimes simply restarting Splunk will be enough!

briangalka
New Member

I have the same message on my cluster master as well, and would also be interested in the answer. Thanks.

0 Karma

rphillips_splk
Splunk Employee
Splunk Employee

I had a similar issue where the CM fixup task under generation reported "cannot fix up search factor as bucket is not serviceable" and those fixup tasks never cleared. It ended up being that the splunkd mgmt port (8089) was not open between indexers and it needs to be.

https://answers.splunk.com/answers/714848/why-is-the-cluster-master-not-able-to-fixup-bucket.html?ch...

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...