Knowledge Management

Kvstore has stuck at starting stage for all the search heads in cluster.

dkolekar_splunk
Splunk Employee
Splunk Employee

I have a search head cluster environment and the kv-store is stuck at the starting stage for all the search heads.

Error: KV Store changed the status to failed. Failed to establish communication with KVStore. See splunkd.log for details. ..

When I removed the search head from Cluster
1. Took backup and clean the kv-store and restore the backup, the kv-store status has become "Ready". But, when we added it back to the search head cluster, the kv-store failed again.

Errors:

8-27-2019 05:37:01.026 -0500 ERROR KVStorageProvider - An error occurred during the last operation ('getServerVersion', domain: '15', code: '13053'): No suitable servers found (serverSelectionTryOnce set): [connection closed calling ismaster on 'hostname:8191']

08-27-2019 05:37:00.596 -0500 INFO SHCMaster - delegate search job requested for savedsearch_name="xvzf - cpu usage saturation"

08-27-2019 05:37:00.596 -0500 INFO SHCMaster - delegate search job requested for savedsearch_name="xvzf Collect - duplicated xvzf instances may occur (excessive nbr of process launched)"

May I know how to troubleshoot this issue further.

Tags (1)
1 Solution

dkolekar_splunk
Splunk Employee
Splunk Employee

As a next action please try below:

Point 1: Remove all nodes from SH cluster:

1.1. Edit server.conf ($Splunk_Home/etc/)stanza a comment the shclustering stanza
1.2. Clean Raft
1.2.1 Stop the member: ./splunk stop
1.2.2 Clean the member's raft folder: ./splunk clean raft
1.2.3 Start the member: ./splunk start

Point 2: Take KV store backup

2.1. stop Splunk.
2.2. take backup of the DB (keep it safe - may be needed in emergency):
2.3. cp -r $SPLUNK_DB/kvstore/mongo /backup
2.4. Rename mongod.lock (Location: $SPLUNK_DB/kvstore/mongod) : mongod.lock_bkp

Point 3: clean up KVStore Splunk

3.1 Stop the member: ./splunk stop
3.2 Clean the member's kvstore: ./splunk clean kvstore --local
3.3 start Splunk

Point 4. Restore teh kvstore from the backup.
cp -r /backup $SPLUNK_DB/kvstore/mongo

Point 5. Check kvstore status on each node.
./splunkshow kvstore-status

Point 6. Create an SH cluster again (Note: Don't just create a sh cluster by un-commentating the shclustering stanza)Initialize all members of shcluster:
./splunk init shcluster-config -auth : -mgmt_uri : -replication_port -replication_factor -conf_deploy_fetch_url : -secret -shcluster_label

Point 7. Bring up the cluster captain - Bootstrap first member
./splunk bootstrap shcluster-captain -servers_list ":,:,..." -auth :

Point 8. Check shcluster status
./splunk show shcluster-status -auth :

Point 9. Check KVstore status
./splunk show kvstore-status -auth :

View solution in original post

dkolekar_splunk
Splunk Employee
Splunk Employee

As a next action please try below:

Point 1: Remove all nodes from SH cluster:

1.1. Edit server.conf ($Splunk_Home/etc/)stanza a comment the shclustering stanza
1.2. Clean Raft
1.2.1 Stop the member: ./splunk stop
1.2.2 Clean the member's raft folder: ./splunk clean raft
1.2.3 Start the member: ./splunk start

Point 2: Take KV store backup

2.1. stop Splunk.
2.2. take backup of the DB (keep it safe - may be needed in emergency):
2.3. cp -r $SPLUNK_DB/kvstore/mongo /backup
2.4. Rename mongod.lock (Location: $SPLUNK_DB/kvstore/mongod) : mongod.lock_bkp

Point 3: clean up KVStore Splunk

3.1 Stop the member: ./splunk stop
3.2 Clean the member's kvstore: ./splunk clean kvstore --local
3.3 start Splunk

Point 4. Restore teh kvstore from the backup.
cp -r /backup $SPLUNK_DB/kvstore/mongo

Point 5. Check kvstore status on each node.
./splunkshow kvstore-status

Point 6. Create an SH cluster again (Note: Don't just create a sh cluster by un-commentating the shclustering stanza)Initialize all members of shcluster:
./splunk init shcluster-config -auth : -mgmt_uri : -replication_port -replication_factor -conf_deploy_fetch_url : -secret -shcluster_label

Point 7. Bring up the cluster captain - Bootstrap first member
./splunk bootstrap shcluster-captain -servers_list ":,:,..." -auth :

Point 8. Check shcluster status
./splunk show shcluster-status -auth :

Point 9. Check KVstore status
./splunk show kvstore-status -auth :

suarezry
Builder

Great step-by-step instructions. Thank you.

0 Karma

anakor
Engager

Thank you! I can recommend using this solution!

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...