This was the search head that kept failing:
splunk > /appl/splunk/bin/splunk show shcluster-status -auth admin:adminpassword
Encountered some errors while trying to obtain sh cluster status.
This node is not the captain of the search head cluster, and we could not determine the current captain. The cluster is either in the process of electing a new captain, or this member hasn't joined the pool
splunk > /appl/splunk/bin/splunk show shcluster-status -auth admin:Adm\!n4Splk
On the other hand, the other 2 SHs look ok when I issued shcluster-status command on CLI:
Captain:
dynamic_captain : 1
elected_captain : Tue Jun 4 12:47:03 2019
id : A6F265E7-5FEC-448A-9ACD-8FE901D045D5
initialized_flag : 0
label : pgv013d27
mgmt_uri : https://172.26.42.160:8089
min_peers_joined_flag : 0
rolling_restart_flag : 0
service_ready_flag : 0
Members:
pgv013aba
label : pgv013aba
last_conf_replication : Tue Jun 4 14:04:23 2019
mgmt_uri : https://172.26.96.216:8089
mgmt_uri_alias : https://172.26.96.216:8089
status : Up
pgv013d27
label : pgv013d27
mgmt_uri : https://172.26.42.160:8089
mgmt_uri_alias : https://172.26.42.160:8089
status : Up
I did check with the forum and made sure that the mgmt_url was correct in server.conf..
It worked for a while but it started failing again after a while.
And I see the captain was selected and up and running. not sure why on the failing SH, it shows cannot determine the current captain.. Any advise where else shall I check?
Thanks.
First things first - check splunkd.log for errors. It looks like come communication problems. between nodes. If all else fails, just reinstall the node from scratch and bootstrap it as a SHC member.
Within server.conf, make sure that the pass4SymmKey is set correctly under both the [general] and [shclustering] stanzas on the failing node, then cycle splunk and re-evaluate.
I am having the error I tried your solution, unfortunately it does not work.