Knowledge Management

Kvstore has stuck at starting stage for all the search heads in cluster.

dkolekar_splunk
Splunk Employee
Splunk Employee

I have a search head cluster environment and the kv-store is stuck at the starting stage for all the search heads.

Error: KV Store changed the status to failed. Failed to establish communication with KVStore. See splunkd.log for details. ..

When I removed the search head from Cluster
1. Took backup and clean the kv-store and restore the backup, the kv-store status has become "Ready". But, when we added it back to the search head cluster, the kv-store failed again.

Errors:

8-27-2019 05:37:01.026 -0500 ERROR KVStorageProvider - An error occurred during the last operation ('getServerVersion', domain: '15', code: '13053'): No suitable servers found (serverSelectionTryOnce set): [connection closed calling ismaster on 'hostname:8191']

08-27-2019 05:37:00.596 -0500 INFO SHCMaster - delegate search job requested for savedsearch_name="xvzf - cpu usage saturation"

08-27-2019 05:37:00.596 -0500 INFO SHCMaster - delegate search job requested for savedsearch_name="xvzf Collect - duplicated xvzf instances may occur (excessive nbr of process launched)"

May I know how to troubleshoot this issue further.

Tags (1)
1 Solution

dkolekar_splunk
Splunk Employee
Splunk Employee

As a next action please try below:

Point 1: Remove all nodes from SH cluster:

1.1. Edit server.conf ($Splunk_Home/etc/)stanza a comment the shclustering stanza
1.2. Clean Raft
1.2.1 Stop the member: ./splunk stop
1.2.2 Clean the member's raft folder: ./splunk clean raft
1.2.3 Start the member: ./splunk start

Point 2: Take KV store backup

2.1. stop Splunk.
2.2. take backup of the DB (keep it safe - may be needed in emergency):
2.3. cp -r $SPLUNK_DB/kvstore/mongo /backup
2.4. Rename mongod.lock (Location: $SPLUNK_DB/kvstore/mongod) : mongod.lock_bkp

Point 3: clean up KVStore Splunk

3.1 Stop the member: ./splunk stop
3.2 Clean the member's kvstore: ./splunk clean kvstore --local
3.3 start Splunk

Point 4. Restore teh kvstore from the backup.
cp -r /backup $SPLUNK_DB/kvstore/mongo

Point 5. Check kvstore status on each node.
./splunkshow kvstore-status

Point 6. Create an SH cluster again (Note: Don't just create a sh cluster by un-commentating the shclustering stanza)Initialize all members of shcluster:
./splunk init shcluster-config -auth : -mgmt_uri : -replication_port -replication_factor -conf_deploy_fetch_url : -secret -shcluster_label

Point 7. Bring up the cluster captain - Bootstrap first member
./splunk bootstrap shcluster-captain -servers_list ":,:,..." -auth :

Point 8. Check shcluster status
./splunk show shcluster-status -auth :

Point 9. Check KVstore status
./splunk show kvstore-status -auth :

View solution in original post

dkolekar_splunk
Splunk Employee
Splunk Employee

As a next action please try below:

Point 1: Remove all nodes from SH cluster:

1.1. Edit server.conf ($Splunk_Home/etc/)stanza a comment the shclustering stanza
1.2. Clean Raft
1.2.1 Stop the member: ./splunk stop
1.2.2 Clean the member's raft folder: ./splunk clean raft
1.2.3 Start the member: ./splunk start

Point 2: Take KV store backup

2.1. stop Splunk.
2.2. take backup of the DB (keep it safe - may be needed in emergency):
2.3. cp -r $SPLUNK_DB/kvstore/mongo /backup
2.4. Rename mongod.lock (Location: $SPLUNK_DB/kvstore/mongod) : mongod.lock_bkp

Point 3: clean up KVStore Splunk

3.1 Stop the member: ./splunk stop
3.2 Clean the member's kvstore: ./splunk clean kvstore --local
3.3 start Splunk

Point 4. Restore teh kvstore from the backup.
cp -r /backup $SPLUNK_DB/kvstore/mongo

Point 5. Check kvstore status on each node.
./splunkshow kvstore-status

Point 6. Create an SH cluster again (Note: Don't just create a sh cluster by un-commentating the shclustering stanza)Initialize all members of shcluster:
./splunk init shcluster-config -auth : -mgmt_uri : -replication_port -replication_factor -conf_deploy_fetch_url : -secret -shcluster_label

Point 7. Bring up the cluster captain - Bootstrap first member
./splunk bootstrap shcluster-captain -servers_list ":,:,..." -auth :

Point 8. Check shcluster status
./splunk show shcluster-status -auth :

Point 9. Check KVstore status
./splunk show kvstore-status -auth :

suarezry
Builder

Great step-by-step instructions. Thank you.

0 Karma

anakor
Engager

Thank you! I can recommend using this solution!

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...