Deployment Architecture
Highlighted

What should I do with bad buckets in a clustered environment that are now affecting search and replication factors in Splunk 6.1.4?

Communicator

I have Clustered Spunk environment (also called as bucket replication) with

--One Cluster Master
--Five cluster Peers
--Search Head.

One of our Cluster Peer ran out of Disk Space for partition holding hot+warm buckets- as a result some bad buckets were created.
We have resolved the disk issue and now Cluster master is reporting some bad buckets and as a result search factor and Replication factors are not met.

Messages such as this one appear as warnings on the Cluster Master:

Search peer indexer01.example.com has the following message: Failed to make bucket = improbable_logs~1368~D823EFB4-14AA-4C97-9500-E21A12608EC4 searchable, retry count = 13.

This is Splunk Version 6.1.4

Highlighted

Re: What should I do with bad buckets in a clustered environment that are now affecting search and replication factors in Splunk 6.1.4?

Splunk Employee
Splunk Employee

Since you already know the root cause of these bad buckets and if you have already analyzed and concluded that these buckets cannot be recovered, you could delete these buckets using the command listed below

For our discussion let say that bad bucket to be deleted is for index=audit and bucket id is "audit~1~350142A5-6AFF-4852-A45C-2A7CDF8FE540"

To delete this bucket, on the cluster Master Splunk command

First put the cluster Master in Maintenance mode

$SPLUNK_HOME/bin/splunk enable maintenance-mode

Use the command below to delete the bucket. Note this command from the Cluster Master will physically delete the buckets from all the peer.

$SPLUNKHOME/bin/splunk _internal call /services/cluster/master/buckets/audit~1~350142A5-6AFF-4852-A45C-2A7CDF8FE540/remove_all -method POST

Disable cluster Master from Maintenance mode
./splunk disable maintenance-mode

Navigate to the index and check the bucket is deleted.

View solution in original post

Highlighted

Re: What should I do with bad buckets in a clustered environment that are now affecting search and replication factors in Splunk 6.1.4?

Explorer

One thing to watch out for in splunkd.log on the CM when performing the removal is

02-11-2015 09:26:16.386 -0600 WARN CMMaster - did not schedule removal for peer=...

It would appear that perhaps a fsck or other activity on the peer prevented removal although the REST call returned a 200. In my case, when the peers were restarted, the damaged buckets began replicating again.

Making the same call a few times while watching for the absence of that error in splunkd.log did the trick for me.

0 Karma

Re: What should I do with bad buckets in a clustered environment that are now affecting search and replication factors in Splunk 6.1.4?

Explorer

By deleting the bucket, the data will be lost correct? Is there no alternate without loosing the raw data ?

0 Karma
Highlighted

Re: What should I do with bad buckets in a clustered environment that are now affecting search and replication factors in Splunk 6.1.4?

Splunk Employee
Splunk Employee

You are correct- delete wil cause it to lose data. Log a Splunk Support Case.

0 Karma