Solution - Our issues were related to RAFT protocol issues, mgmt_uri mismatch and appendentries errors, on the existing members. As suggestion of splunk support we set the SHC to a static captain, which bypasses RAFT, and afterwards everything was fine. We edited the /opt/splunk/etc/system/local/server.conf to include these entries under [shclustering] to increase timeouts for RAFT and connectivity. captain_is_adhoc_searchhead = true cxn_timeout_raft = 6 rcv_timeout_raft = 10 send_timeout_raft = 10 cxn_timeout = 120 send_timeout = 120 rcv_timeout = 120 election_timeout_ms = 120000 heartbeat_timeout = 120 Then set a static captain as described here. https://docs.splunk.com/Documentation/Splunk/8.0.4/DistSearch/Staticcaptain On captain /opt/splunk/bin/splunk edit shcluster-config -mode captain -captain_uri https://captainuri:8089 -election false on members. /opt/splunk/bin/splunk edit shcluster-config -mode member -captain_uri https://captainuri:8089 -election false At which point all nodes showed as cluster members successfully. We then reverted back to dynamic captain, performed a bootstrap and performed a several rolling restarts to confirm members behaved as expected. run on each member, captain last. /opt/splunk/bin/splunk edit shcluster-config -election true -mgmt_uri https://memberurl:8089 Then bootstrap to "rebuild" member entries in KV store . /opt/splunk/bin/splunk bootstrap shcluster-captain -servers_list <URI>:<management_port>,<URI>:<management_port>."
... View more