Splunk Search

Search Head Cluster captain election fails with error -NOT_LEADER CURRENT_STATE = FOLLOWER

sat94541
Communicator

We have 5 Node SHC member on splunk version 6.3. The Captain election is not suceeding.
We followed steps and cleared _raft and that did not help.
Steps that were taken are

1) Stop all SHC members.
2)Clean _raft on all nodes > $SPLUNK_HOME/var/run/splunk/_raft
3)restart all members
4)Attempted to bootstraped all using command

splunk bootstrap shcluster-captain -servers_list "<URI>:<management_port>,<URI>:<management_port>,..." -auth <username>:<password>

This failed with error SHPRaftConsensus - NOT_LEADER CURRENT_STATE = FOLLOWER

teh Splunkd.log has the folloing entries

01-05-2016 19:35:53.658 -0500 INFO ServerConfig - My server name is "test5421.xx.test.com".
01-05-2016 19:35:53.659 -0500 INFO ServerConfig - My hostname is "test5421".
01-05-2016 19:40:37.058 -0500 INFO SHPRaftConsensus - stepDown(1)
01-05-2016 19:40:37.058 -0500 INFO SHPRaftConsensus - Activating configuration 1:\n<configuration>\n<prev_configuration>\n<server>\n<server_id>https://test5421.xx.test.com:8089
01-05-2016 19:41:03.430 -0500 INFO SHPRaftConsensus - Running for election in term 2
01-05-2016 19:41:03.431 -0500 INFO SHPRaftConsensus - Now leader for term 2
01-05-2016 19:41:03.431 -0500 INFO SHPRaftConsensus - New commitIndex: 2
01-05-2016 19:41:03.431 -0500 INFO SHPoolingMgr - Making node the captain
01-05-2016 19:41:03.431 -0500 INFO SHPoolingMgr - makeOrChangeSlave - master_shp = https://test5421.xx.test.com:8089
01-05-2016 19:41:03.613 -0500 INFO SHPRaftConsensus - stepDown(7495)
01-05-2016 19:41:03.613 -0500 INFO SHPRaftConsensus - Activating configuration 1:\n<configuration>\n<prev_conf
iguration>\n<server>\n<server_id>https://test5421.xx.test.com:8089&lt;/server_id&gt;\n&lt;/server&gt;\n&lt;/prev_configuration&gt;\n&...
01-05-2016 19:41:03.613 -0500 INFO SHPRaftConsensus - Exiting and deleting server : https://test5422.xx.test.com:8089
01-05-2016 19:41:03.613 -0500 INFO SHPRaftConsensus - Exiting and deleting server : https://testa9437.xx.test.com:8089
01-05-2016 19:41:03.613 -0500 INFO SHPRaftConsensus - Exiting and deleting server : https://test9453.xx.test.com:8089
01-05-2016 19:41:03.613 -0500 INFO SHPRaftConsensus - Exiting and deleting server : https://test9454.xx.test.com:8089
01-05-2016 19:41:03.613 -0500 INFO SHPoolingMgr - makeOrChangeSlave - master_shp = ?
01-05-2016 19:41:03.613 -0500 INFO SHPRaftConsensus - NOT_LEADER CURRENT_STATE = FOLLOWER

Note in the above log we see "stepDown(1)" and "stepDown(7495)" which does not seems right

0 Karma

rbal_splunk
Splunk Employee
Splunk Employee

It could be network issues leading to the failing in append entries while bootstrapping,--check in splunkd.log

0 Karma

sat94541
Communicator

Here is what worked::::

1) Stop all 5 SHC members.
2)lean _raft on all nodes > $SPLUNK_HOME/var/run/splunk/_raft. NOTE: It needs to be cleaned from all nodes.
3) restart all 5 SHC members
6)We initially bootstrapped one member

Bootstrap one node using command like below and then added peers using add peer on the captain bootstrapped

splunk bootstrap shcluster-captain -servers_list ":" -auth :

Here the reference to add peer:

http://docs.splunk.com/Documentation/Splunk/6.2.0/DistSearch/Addaclustermember#Add_the_instance

0 Karma

rbal_splunk
Splunk Employee
Splunk Employee

when you clear make sure all the nodes are stopped and turn off.
Can you try bootstrapping just one member and then keep adding peers using add peer on the captain bootstrapped

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...