Knowledge Management

Kvstore has stuck at starting stage for all the search heads in cluster.

dkolekar_splunk
Splunk Employee
Splunk Employee

I have a search head cluster environment and the kv-store is stuck at the starting stage for all the search heads.

Error: KV Store changed the status to failed. Failed to establish communication with KVStore. See splunkd.log for details. ..

When I removed the search head from Cluster
1. Took backup and clean the kv-store and restore the backup, the kv-store status has become "Ready". But, when we added it back to the search head cluster, the kv-store failed again.

Errors:

8-27-2019 05:37:01.026 -0500 ERROR KVStorageProvider - An error occurred during the last operation ('getServerVersion', domain: '15', code: '13053'): No suitable servers found (serverSelectionTryOnce set): [connection closed calling ismaster on 'hostname:8191']

08-27-2019 05:37:00.596 -0500 INFO SHCMaster - delegate search job requested for savedsearch_name="xvzf - cpu usage saturation"

08-27-2019 05:37:00.596 -0500 INFO SHCMaster - delegate search job requested for savedsearch_name="xvzf Collect - duplicated xvzf instances may occur (excessive nbr of process launched)"

May I know how to troubleshoot this issue further.

Tags (1)
1 Solution

dkolekar_splunk
Splunk Employee
Splunk Employee

As a next action please try below:

Point 1: Remove all nodes from SH cluster:

1.1. Edit server.conf ($Splunk_Home/etc/)stanza a comment the shclustering stanza
1.2. Clean Raft
1.2.1 Stop the member: ./splunk stop
1.2.2 Clean the member's raft folder: ./splunk clean raft
1.2.3 Start the member: ./splunk start

Point 2: Take KV store backup

2.1. stop Splunk.
2.2. take backup of the DB (keep it safe - may be needed in emergency):
2.3. cp -r $SPLUNK_DB/kvstore/mongo /backup
2.4. Rename mongod.lock (Location: $SPLUNK_DB/kvstore/mongod) : mongod.lock_bkp

Point 3: clean up KVStore Splunk

3.1 Stop the member: ./splunk stop
3.2 Clean the member's kvstore: ./splunk clean kvstore --local
3.3 start Splunk

Point 4. Restore teh kvstore from the backup.
cp -r /backup $SPLUNK_DB/kvstore/mongo

Point 5. Check kvstore status on each node.
./splunkshow kvstore-status

Point 6. Create an SH cluster again (Note: Don't just create a sh cluster by un-commentating the shclustering stanza)Initialize all members of shcluster:
./splunk init shcluster-config -auth : -mgmt_uri : -replication_port -replication_factor -conf_deploy_fetch_url : -secret -shcluster_label

Point 7. Bring up the cluster captain - Bootstrap first member
./splunk bootstrap shcluster-captain -servers_list ":,:,..." -auth :

Point 8. Check shcluster status
./splunk show shcluster-status -auth :

Point 9. Check KVstore status
./splunk show kvstore-status -auth :

View solution in original post

dkolekar_splunk
Splunk Employee
Splunk Employee

As a next action please try below:

Point 1: Remove all nodes from SH cluster:

1.1. Edit server.conf ($Splunk_Home/etc/)stanza a comment the shclustering stanza
1.2. Clean Raft
1.2.1 Stop the member: ./splunk stop
1.2.2 Clean the member's raft folder: ./splunk clean raft
1.2.3 Start the member: ./splunk start

Point 2: Take KV store backup

2.1. stop Splunk.
2.2. take backup of the DB (keep it safe - may be needed in emergency):
2.3. cp -r $SPLUNK_DB/kvstore/mongo /backup
2.4. Rename mongod.lock (Location: $SPLUNK_DB/kvstore/mongod) : mongod.lock_bkp

Point 3: clean up KVStore Splunk

3.1 Stop the member: ./splunk stop
3.2 Clean the member's kvstore: ./splunk clean kvstore --local
3.3 start Splunk

Point 4. Restore teh kvstore from the backup.
cp -r /backup $SPLUNK_DB/kvstore/mongo

Point 5. Check kvstore status on each node.
./splunkshow kvstore-status

Point 6. Create an SH cluster again (Note: Don't just create a sh cluster by un-commentating the shclustering stanza)Initialize all members of shcluster:
./splunk init shcluster-config -auth : -mgmt_uri : -replication_port -replication_factor -conf_deploy_fetch_url : -secret -shcluster_label

Point 7. Bring up the cluster captain - Bootstrap first member
./splunk bootstrap shcluster-captain -servers_list ":,:,..." -auth :

Point 8. Check shcluster status
./splunk show shcluster-status -auth :

Point 9. Check KVstore status
./splunk show kvstore-status -auth :

suarezry
Builder

Great step-by-step instructions. Thank you.

0 Karma

anakor
Engager

Thank you! I can recommend using this solution!

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...