Why am I getting "bucket not serviceable" errors o...

austinament · ‎05-23-2016

Receiving "bucket not serviceable" errors on cluster master and replication is failing for some buckets. Whats the best way to resolve this?

rbal_splunk · ‎12-03-2018

If the cluster peers are crashing often in clustered deployment you may end up having buckets in this state.

sowings · ‎03-13-2018

A bucket is considered "serviceable" when there is at least one backup copy aside from the main copy. Even if this bucket is "merely" a replica copy, a fully searchable version of this can be built. It's the bare minimum of data security for the cluster.

What might lead to a bucket not being serviceable?

When the data first arrives, the indexer creates a hot bucket to store the data. It notifies the cluster master that it has new data.
The cluster master replies with a list of peers to which to send duplicate "streams". If it can't talk to the cluster master (down, or some weird network issue), then it will continue with the list of replica peers it had before. If all of the peers are also unavailable (down, or unreachable over the network), then this host will be stuck holding the only copy of the data. This bucket is not yet serviceable.

How then do we recover from this?

Hot buckets will naturally roll to warm buckets. A few possible dials in indexes.conf govern the timing of this transition. It can be based on time, size, or several other possible reasons.
When the bucket becomes warm, the bucket becomes a non-streaming source. The cluster master can now prompt the primary host to replicate this bucket elsewhere to create a backup copy. When this happens, the bucket is now "serviceable".
In some cases, however, the hot-to-warm transition alone is not enough to prompt "fix up" activity for the bucket. This can be fixed by restarting the indexer: Upon joining the cluster, the indexer provides a list of all of the data buckets present on its filesystem, and the cluster master will merge this list with the lists from the other indexers. If at this point the CM recognizes that there aren't enough copies of the bucket to meet policy (replication_factor and search_factor), it will trigger "fix up" activity by making extra copies or prompting replica copies to become searchable.

What's the long term strategy?

Ensure good network connectivity between your cluster master and indexers. I myself have entered a couple of mistaken firewall rules, halting communication between the cluster peers, and thereby generated a few "non-serviceable" buckets. It's a recoverable situation. Don't panic. As strange as it is to say, sometimes simply restarting Splunk will be enough!

briangalka · ‎05-23-2017

I have the same message on my cluster master as well, and would also be interested in the answer. Thanks.

rphillips_splk · ‎01-11-2019

I had a similar issue where the CM fixup task under generation reported "cannot fix up search factor as bucket is not serviceable" and those fixup tasks never cleared. It ended up being that the splunkd mgmt port (8089) was not open between indexers and it needs to be.

https://answers.splunk.com/answers/714848/why-is-the-cluster-master-not-able-to-fixup-bucket.html?ch...

Why am I getting "bucket not serviceable" errors on an indexer cluster master and replication is failing for some buckets?

Index This | Why did the turkey cross the road?

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Feel the Splunk Love: Real Stories from Real Customers

Are you a member of the Splunk Community?

Why am I getting "bucket not serviceable" errors on an indexer cluster master and replication is failing for some buckets?

Index This | Why did the turkey cross the road?

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Feel the Splunk Love: Real Stories from Real Customers