Deployment Architecture

Search head cluster cluster rolling restart issue

mwdbhyat
Builder

Hi,

I am having an issue with my SH cluster.. Was working fine, now there are no members. Captain is elected dynamically. All of the _flag options are 0 under the status. It seems as though none of the peers want to join. There are no errors in splunkd that imply there is a problem or related to an issue with this. If it were an issue with a pass4SymmKey change this would be represented in the logs surely?

Any thoughts ?

0 Karma
1 Solution

mwdbhyat
Builder

Fixed using the following method:
Rebuilt the SHC using the KVstore from one of the members.
Followed the following steps:
1. *Stop all SHC members and copy the kvstore *cd $SPLUNK_HOME/var/lib/splunk/kvstore tar cvfz kvstore-.tar.gz * Move to a safe place.

  1. Remove configuration of the cluster from all members. Since some were not part of the cluster and some that were part of the cluster did not exist we could not use the remove CLI. Delete [shclustering] stanza from server.conf
  2. Clean raft and mongod folders *rm -rf $SPLUNK_HOME/var/run/splunk/_raft/ rm -rf $SPLUNK_HOME/var/lib/splunk/kvstore/mongo/* 4.* Verify all members have replication_factor *= 3 $SPLUNK_HOME/bin/splunk btool server list shclustering | grep replication_factor
  3. *Start all members *$SPLUNK_HOME/bin/splunk start
  4. *Initialize all members *(Note: Use command to deploy SHC with Indexer cluster if SHC is part of IDX Cluster) splunk init shcluster-config -auth admin:changed -mgmt_uri https://sh1.example.com:8089 -replication_port 34567 -replication_factor 3 -conf_deploy_fetch_url https://:8089 -secret mykey -shcluster_label shcluster1
  5. ONLY on member that will be first captain. Restore kvstore. $SPLUNK_HOME/bin/splunk stop cd $SPLUNK_HOME/var/lib/splunk/kvstore tar xvfz /kvstore-.tar.gz $SPLUNK_HOME/bin/splunk clean kvstore --cluster $SPLUNK_HOME/bin/splunk start
  6. *Bootstrap first member *splunk bootstrap shcluster-captain -servers_list ":" -auth :
  7. *Verify kvstore *is working fine and available splunk show shcluster-status -auth :
  8. Add the rest of the members to the cluster *splunk add shcluster-member current_member_uri "https://hostname:mngmt_port" (if its a new member change "current", to "new", if new, it must be done from Captain, if existing it must be done from the host itself). *Afterwards a resync of config bundle may need to be done. Check status on DMC for this. If so run command /opt/splunk/bin/splunk resync shcluster-replicated-config

View solution in original post

mwdbhyat
Builder

Fixed using the following method:
Rebuilt the SHC using the KVstore from one of the members.
Followed the following steps:
1. *Stop all SHC members and copy the kvstore *cd $SPLUNK_HOME/var/lib/splunk/kvstore tar cvfz kvstore-.tar.gz * Move to a safe place.

  1. Remove configuration of the cluster from all members. Since some were not part of the cluster and some that were part of the cluster did not exist we could not use the remove CLI. Delete [shclustering] stanza from server.conf
  2. Clean raft and mongod folders *rm -rf $SPLUNK_HOME/var/run/splunk/_raft/ rm -rf $SPLUNK_HOME/var/lib/splunk/kvstore/mongo/* 4.* Verify all members have replication_factor *= 3 $SPLUNK_HOME/bin/splunk btool server list shclustering | grep replication_factor
  3. *Start all members *$SPLUNK_HOME/bin/splunk start
  4. *Initialize all members *(Note: Use command to deploy SHC with Indexer cluster if SHC is part of IDX Cluster) splunk init shcluster-config -auth admin:changed -mgmt_uri https://sh1.example.com:8089 -replication_port 34567 -replication_factor 3 -conf_deploy_fetch_url https://:8089 -secret mykey -shcluster_label shcluster1
  5. ONLY on member that will be first captain. Restore kvstore. $SPLUNK_HOME/bin/splunk stop cd $SPLUNK_HOME/var/lib/splunk/kvstore tar xvfz /kvstore-.tar.gz $SPLUNK_HOME/bin/splunk clean kvstore --cluster $SPLUNK_HOME/bin/splunk start
  6. *Bootstrap first member *splunk bootstrap shcluster-captain -servers_list ":" -auth :
  7. *Verify kvstore *is working fine and available splunk show shcluster-status -auth :
  8. Add the rest of the members to the cluster *splunk add shcluster-member current_member_uri "https://hostname:mngmt_port" (if its a new member change "current", to "new", if new, it must be done from Captain, if existing it must be done from the host itself). *Afterwards a resync of config bundle may need to be done. Check status on DMC for this. If so run command /opt/splunk/bin/splunk resync shcluster-replicated-config

lycollicott
Motivator

Can you post the results of splunk show shcluster-status?

0 Karma

mwdbhyat
Builder

Captain:
dynamic_captain : 1
elected_captain : captain
id : id
initializaed_flag : 0
label : label
mgmt_uri : uri
min_peers_joined_flag : 0
rolling_restart_flag : 0
service_ready_flag : 0

..Members:

..Doesnt list anything. When I restart the captain, it will elect a new captain as normal, however it never displays any members/no members join.. Nothing in the DMC either.

0 Karma
Get Updates on the Splunk Community!

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Survey for Splunk Admins and App Developers is open now! | Earn a $35 gift card!      Hello there,  Splunk ...

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

You’ve probably heard the latest about AppDynamics joining the Splunk Observability portfolio, deepening our ...

Monitoring Amazon Elastic Kubernetes Service (EKS)

As we’ve seen, integrating Kubernetes environments with Splunk Observability Cloud is a quick and easy way to ...