Receiving "bucket not serviceable" errors on cluster master and replication is failing for some buckets. Whats the best way to resolve this?
If the cluster peers are crashing often in clustered deployment you may end up having buckets in this state.
A bucket is considered "serviceable" when there is at least one backup copy aside from the main copy. Even if this bucket is "merely" a replica copy, a fully searchable version of this can be built. It's the bare minimum of data security for the cluster.
What might lead to a bucket not being serviceable?
How then do we recover from this?
What's the long term strategy?
Ensure good network connectivity between your cluster master and indexers. I myself have entered a couple of mistaken firewall rules, halting communication between the cluster peers, and thereby generated a few "non-serviceable" buckets. It's a recoverable situation. Don't panic. As strange as it is to say, sometimes simply restarting Splunk will be enough!
I have the same message on my cluster master as well, and would also be interested in the answer. Thanks.
I had a similar issue where the CM fixup task under generation reported "cannot fix up search factor as bucket is not serviceable" and those fixup tasks never cleared. It ended up being that the splunkd mgmt port (8089) was not open between indexers and it needs to be.