I was able to use the following "Answers" post to get my three member SHC KV Store up and running again:
https://answers.splunk.com/answers/375964/why-is-our-search-head-cluster-captain-generating.html
I used "Option 1" and as suggested I backed up the KV Store folders from each of the three members.
The link does not provide an explanation for how to recover the backed up kvstore data.
How do I recover the data from the backed up kvstore folders/files and append/prepend it to the already accumulating data in my now reactivated KV Stores?
TXH
Hi mdwecht,
You have tried the following instructions in the admin manual about restoring KV data, right?
http://docs.splunk.com/Documentation/Splunk/6.5.0/Admin/BackupKVstore#Restore_the_KV_Store_data
Did you run into any errors during the process? Did you tried the steps described in the same manual to troubleshoot kv restoration?
http://docs.splunk.com/Documentation/Splunk/6.5.0/Admin/TroubleshootKVstore
Thanks!
Hunter
Hunter,
Yes, I have followed the steps put forward in the Admin Manual. Understand that I am using CentOS 6.7 w/ Splunk 6.4.0 and some of the KV Store tools available in later Splunk versions are not yet available to me. I am using curl at the CLI and also using Splunk to look at the mongod and splunkd logs.
This is how I got to where I am:
The trouble began when an overzealous admin accidently deleted directories out of one of one of my three running SHC members.
I used CLI commands to remove the corrupted member, from one of the other members, which seemed to work after which I killed the Splunk related zombie processes still running on that member's server, supporting the corrupted member, since the pid file directory and bin directory were gone I couldn't use the corrupted members CLI.
I un-tar-ed another instance of Splunk 6.4.0 into /opt/splunk to replace the corrupted instance then followed the Splunk docks for "init" and "add new" to shcluster. All went smoothly the new instance came up fine and then I forced it to resync. (At that point I didn't know that I should have copied the kvstore directory from one of the existing members to the new SHC member. I assumed adding the new member to the SHC would sort that out.)
The KV Store was not properly initialized.
In mongod.log on the two old shcluster members I saw events like:
Error in heartbeat request to ------------ InvalidReplicaSetConfig Our replica set configuration is invalid or does not include us
I got past all of this and got the cluster back up with kvStore "ready" by following Option 1 in the link below which provided me with a "clean" kvstore across the SHC and three backup files containing the kvstore folders from the three SHC members. Once the kvstore was ready in my SHC the stores used to collect information started doing just that and the stores used to hold enrichment data are empty. I need find a way to restore the data from the backups to replace the missing enrichment data while retaining the date now collecting in the other stores.
Used Option 1 in the following "Answers" post to get the KV Store up and running (less the data I need to restore):
https://answers.splunk.com/answers/375964/why-is-our-search-head-cluster-captain-generating.html
THX
I did find some guidance on restoring KV Store data in the Splunk Admin Manual but none of the suggested step have worked for me so far. I backed up the kvstore folder from the two SHC members that were not corrupted. The two backups are not the same size, not sure if that matters. I attempted to use one backup and then the other to initialize each SHC member but both attempts failed to initialize. I may attempt to follow the steps for initializing a new SHC from and old SHC. I would welcome any advise.
I am using Splunk 6.4.0 on CentOS 6.7