I have a cluster with 3 indexers with a bunch of indexes. Yesterday I had issues after service restarts on cluster master. After peers joined the cluster and completed their replication to meet search and replication factors, one of the indexes still says that searchable copies is 0. Replicated copies is 2 like it should be and searchable state is set to No.
It seems that something got stuck and searchable copies are not replicating.
Does anybody know how to troubleshoot this? or to force this replication of particular index?
I am running 5.0.7.
so Peer_URI would be URI for Cluster master, right?
would this re-add cause any outage?
Looks like that endpoint was added in 5.0.2 so that endpoint should work in 5.0.7
Did this problem show up after you restarted the master? In 5.0.x, the cluster master did not track frozen buckets properly. After a restart, the master would then proceed to fix up buckets that were previously frozen. See here for more info: http://docs.splunk.com/Documentation/Splunk/5.0.4/Indexer/Upgradeacluster#Why_the_safe_restart_clust.... The UI in 5.0.x also reported the worst case always: ie if there was even one bucket with no searchable copy then it would report that index as having no searchable copies. The problem might just be caused by a subset of the buckets like these frozen buckets.
When you restart the master, a similar procedure as mentioned in the link for upgrades is needed. I wonder if this could be the problem in your case. Since you've already restarted the master, you cannot use that script as it is anymore since the information is already lost. But we might still be able to recover by (1) just giving the cluster enough time or (2) using search to figure out the list of buckets to be fixed and then scripting it from there.
Try this search on the master from the cli, to get a list of frozen buckets
$SPLUNKHOME/bin/splunk search 'index=internal component=CMMaster "remove bucket" frozen=true | dedup bid | table bid' -preview 0 > /var/tmp/frozen_buckets
and maybe also:
grep myindex /var/tmp/frozenbuckets | wc -l
to see how many such buckets show up for that index. That will tell us if this is the problem
I ran your script, parsed for my index and got 3 bucket IDs. I had a bunch of buckets for other indexes but they are not complaining in cluster master dashboard.
Can you try this from the cli:
for i in
cat /var/tmp/frozen_buckets; do curl -d "" -k -u admin:changeme https://localhost:8089/services/cluster/master/buckets/$i/freeze; done
(EDIT: there is a backtick around cat /var/tmp/frozen_buckets not sure if that shows up in the comment)
$SPLUNKHOME/bin/splunk _internal call /cluster/master/control/directive/commitgeneration -method POST
You can use a file with just the bucket-ids from the index you care about or just do everything.
Can you let me know how that goes.
Also, if it is just 3 buckets, you can probably let it be for a bit and it should fix itself up.