Solved: Kvstore has stuck at starting stage for all the se...

dkolekar_splunk · ‎08-30-2019

I have a search head cluster environment and the kv-store is stuck at the starting stage for all the search heads.

Error: KV Store changed the status to failed. Failed to establish communication with KVStore. See splunkd.log for details. ..

When I removed the search head from Cluster
1. Took backup and clean the kv-store and restore the backup, the kv-store status has become "Ready". But, when we added it back to the search head cluster, the kv-store failed again.

Errors:

8-27-2019 05:37:01.026 -0500 ERROR KVStorageProvider - An error occurred during the last operation ('getServerVersion', domain: '15', code: '13053'): No suitable servers found (serverSelectionTryOnce set): [connection closed calling ismaster on 'hostname:8191']

08-27-2019 05:37:00.596 -0500 INFO SHCMaster - delegate search job requested for savedsearch_name="xvzf - cpu usage saturation"

08-27-2019 05:37:00.596 -0500 INFO SHCMaster - delegate search job requested for savedsearch_name="xvzf Collect - duplicated xvzf instances may occur (excessive nbr of process launched)"

May I know how to troubleshoot this issue further.

dkolekar_splunk · ‎08-30-2019

As a next action please try below:

Point 1: Remove all nodes from SH cluster:

1.1. Edit server.conf ($Splunk_Home/etc/)stanza a comment the shclustering stanza
1.2. Clean Raft
1.2.1 Stop the member: ./splunk stop
1.2.2 Clean the member's raft folder: ./splunk clean raft
1.2.3 Start the member: ./splunk start

Point 2: Take KV store backup

2.1. stop Splunk.
2.2. take backup of the DB (keep it safe - may be needed in emergency):
2.3. cp -r $SPLUNK_DB/kvstore/mongo /backup
2.4. Rename mongod.lock (Location: $SPLUNK_DB/kvstore/mongod) : mongod.lock_bkp

Point 3: clean up KVStore Splunk

3.1 Stop the member: ./splunk stop
3.2 Clean the member's kvstore: ./splunk clean kvstore --local
3.3 start Splunk

Point 4. Restore teh kvstore from the backup.
cp -r /backup $SPLUNK_DB/kvstore/mongo

Point 5. Check kvstore status on each node.
./splunkshow kvstore-status

Point 6. Create an SH cluster again (Note: Don't just create a sh cluster by un-commentating the shclustering stanza)Initialize all members of shcluster:
./splunk init shcluster-config -auth : -mgmt_uri : -replication_port -replication_factor -conf_deploy_fetch_url : -secret -shcluster_label

Point 7. Bring up the cluster captain - Bootstrap first member
./splunk bootstrap shcluster-captain -servers_list ":,:,..." -auth :

Point 8. Check shcluster status
./splunk show shcluster-status -auth :

Point 9. Check KVstore status
./splunk show kvstore-status -auth :

View solution in original post

dkolekar_splunk · ‎08-30-2019