I have a search head cluster environment and the kv-store is stuck at the starting stage for all the search heads.
Error: KV Store changed the status to failed. Failed to establish communication with KVStore. See splunkd.log for details. ..
When I removed the search head from Cluster
1. Took backup and clean the kv-store and restore the backup, the kv-store status has become "Ready". But, when we added it back to the search head cluster, the kv-store failed again.
Errors:
8-27-2019 05:37:01.026 -0500 ERROR KVStorageProvider - An error occurred during the last operation ('getServerVersion', domain: '15', code: '13053'): No suitable servers found (serverSelectionTryOnce
set): [connection closed calling ismaster on 'hostname:8191']
08-27-2019 05:37:00.596 -0500 INFO SHCMaster - delegate search job requested for savedsearch_name="xvzf - cpu usage saturation"
08-27-2019 05:37:00.596 -0500 INFO SHCMaster - delegate search job requested for savedsearch_name="xvzf Collect - duplicated xvzf instances may occur (excessive nbr of process launched)"
May I know how to troubleshoot this issue further.
As a next action please try below:
Point 1: Remove all nodes from SH cluster:
1.1. Edit server.conf ($Splunk_Home/etc/)stanza a comment the shclustering stanza
1.2. Clean Raft
1.2.1 Stop the member: ./splunk stop
1.2.2 Clean the member's raft folder: ./splunk clean raft
1.2.3 Start the member: ./splunk start
Point 2: Take KV store backup
2.1. stop Splunk.
2.2. take backup of the DB (keep it safe - may be needed in emergency):
2.3. cp -r $SPLUNK_DB/kvstore/mongo /backup
2.4. Rename mongod.lock (Location: $SPLUNK_DB/kvstore/mongod) : mongod.lock_bkp
Point 3: clean up KVStore Splunk
3.1 Stop the member: ./splunk stop
3.2 Clean the member's kvstore: ./splunk clean kvstore --local
3.3 start Splunk
Point 4. Restore teh kvstore from the backup.
cp -r /backup $SPLUNK_DB/kvstore/mongo
Point 5. Check kvstore status on each node.
./splunkshow kvstore-status
Point 6. Create an SH cluster again (Note: Don't just create a sh cluster by un-commentating the shclustering stanza)Initialize all members of shcluster:
./splunk init shcluster-config -auth : -mgmt_uri : -replication_port -replication_factor -conf_deploy_fetch_url : -secret -shcluster_label
Point 7. Bring up the cluster captain - Bootstrap first member
./splunk bootstrap shcluster-captain -servers_list ":,:,..." -auth :
Point 8. Check shcluster status
./splunk show shcluster-status -auth :
Point 9. Check KVstore status
./splunk show kvstore-status -auth :
As a next action please try below:
Point 1: Remove all nodes from SH cluster:
1.1. Edit server.conf ($Splunk_Home/etc/)stanza a comment the shclustering stanza
1.2. Clean Raft
1.2.1 Stop the member: ./splunk stop
1.2.2 Clean the member's raft folder: ./splunk clean raft
1.2.3 Start the member: ./splunk start
Point 2: Take KV store backup
2.1. stop Splunk.
2.2. take backup of the DB (keep it safe - may be needed in emergency):
2.3. cp -r $SPLUNK_DB/kvstore/mongo /backup
2.4. Rename mongod.lock (Location: $SPLUNK_DB/kvstore/mongod) : mongod.lock_bkp
Point 3: clean up KVStore Splunk
3.1 Stop the member: ./splunk stop
3.2 Clean the member's kvstore: ./splunk clean kvstore --local
3.3 start Splunk
Point 4. Restore teh kvstore from the backup.
cp -r /backup $SPLUNK_DB/kvstore/mongo
Point 5. Check kvstore status on each node.
./splunkshow kvstore-status
Point 6. Create an SH cluster again (Note: Don't just create a sh cluster by un-commentating the shclustering stanza)Initialize all members of shcluster:
./splunk init shcluster-config -auth : -mgmt_uri : -replication_port -replication_factor -conf_deploy_fetch_url : -secret -shcluster_label
Point 7. Bring up the cluster captain - Bootstrap first member
./splunk bootstrap shcluster-captain -servers_list ":,:,..." -auth :
Point 8. Check shcluster status
./splunk show shcluster-status -auth :
Point 9. Check KVstore status
./splunk show kvstore-status -auth :
Great step-by-step instructions. Thank you.
Thank you! I can recommend using this solution!