Why is search head cluster not showing the search ...

dhawal_sanghvi · ‎04-10-2017

I am not able to view search head members in the Members listed. I have 2 search head nodes one acting as captain (xxxxxxx02) and other acting as member. But I don't see the member listed when I run the shcluster-status (output provided below) from the member node (xxxxxx01). I am not sure if the member is part of search head clustering.

[root@xxxxx01 users]# /opt/splunk/bin/splunk show shcluster-status
In handler 'shclusterstatus': Node is not captain. Current captain = https://xxxxx02:8089

[root@xxxxxx01 users]# /opt/splunk/bin/splunk show shcluster-status

Captain:
dynamic_captain : 1
elected_captain : Mon Apr 10 14:17:22 2017
id : 85C9FA62-9AE7-47E5-A7D6-D114C2B15BCC
initialized_flag : 0
label : xxxxxxxx02
mgmt_uri : https://xxxxx02:8089
min_peers_joined_flag : 0
rolling_restart_flag : 0
service_ready_flag : 0

Members:
xxxxxxx02
label : xxxxxxx02
mgmt_uri : https://xxxx02:8089
mgmt_uri_alias : https://xxxxx02:8089
status : Up

I get the same status output from the Captain node (xxxxx02) which doesn't show the server - xxxxx01 listed as the member. How do I confirm if the clustering is setup properly and if both nodes are under search head clustering?

dhawal_sanghvi · ‎04-11-2017

I have now 3 nodes ( 2 Search head members and 1 Captain) but still I don't get the SHC members listed when I run the status command from the members. It just shows the Captain as shown below.

[root@muw1splmonpin01 cortana]# /opt/splunk/bin/splunk show shcluster-status

Captain:
dynamic_captain : 1
elected_captain : Tue Apr 11 10:25:15 2017
id : 85C9FA62-9AE7-47E5-A7D6-D114C2B15BCC
initialized_flag : 0
label : muw1splmonpin02
mgmt_uri : https://10.142.98.6:8089
min_peers_joined_flag : 0
rolling_restart_flag : 0
service_ready_flag : 0

Members:
muw1splmonpin02
label : muw1splmonpin02
mgmt_uri : https://10.142.98.6:8089
mgmt_uri_alias : https://10.142.98.6:8089
status : Up

**

Splunkd.log from the Captain Node showing errors:

**
04-11-2017 10:30:12.530 +0000 INFO SHCMaster - event=heartbeat guid=4FCA0F9D-CA10-4A83-8593-3CC0EBDB2868 msg='signaling Initial-Add (received heartbeat from Down peer)'

04-11-2017 10:30:12.559 +0000 ERROR SHCMasterPeerHandler - Cannot add peer=10.142.98.5 mgmtport=8089 (reason: removeOldPeer peer=4FCA0F9D-CA10-4A83-8593-3CC0EBDB2868, serverName=muw1splmonpin01, hostport=10.142.98.5:8089, but found different peer=4FCA0F9D-CA10-4A83-8593-3CC0EBDB2868 with serverName=muw1splmonpin02 and hostport=10.142.98.6:8089 already registered and UP)

04-11-2017 10:30:16.435 +0000 INFO SHCMaster - event=heartbeat guid=4FCA0F9D-CA10-4A83-8593-3CC0EBDB2868 msg='signaling Initial-Add (received heartbeat from Down peer)'

04-11-2017 10:30:16.445 +0000 ERROR SHCMasterPeerHandler - Cannot add peer=10.142.98.7 mgmtport=8089 (reason: removeOldPeer peer=4FCA0F9D-CA10-4A83-8593-3CC0EBDB2868, serverName=muw1splmonpin03, hostport=10.142.98.7:8089, but found different peer=4FCA0F9D-CA10-4A83-8593-3CC0EBDB2868 with serverName=muw1splmonpin02 and hostport=10.142.98.6:8089 already registered and UP)

**

Splunkd log from one of the Member showing warning and error:

**

04-11-2017 10:35:38.132 +0000 INFO SHCSlave - event=SHPSlave::addPreexistingArtifacts alive_sids=0 done_sids=0 notdone_sids(skipped)=0 artifacts=0 replicas=0
04-11-2017 10:35:38.136 +0000 WARN SHCMasterHTTPProxy - Low Level http request failure err=failed method=POST path=/services/shcluster/captain/members captain=10.142.98.6:8089 rc=0 actual_response_code=500 expected_response_code=201 status_line="Internal Server Error" transaction_error="\n \n \n In handler 'shclustercaptainmembers': Cannot add peer=10.142.98.7 mgmtport=8089 (reason: removeOldPeer peer=4FCA0F9D-CA10-4A83-8593-3CC0EBDB2868, serverName=muw1splmonpin03, hostport=10.142.98.7:8089, but found different peer=4FCA0F9D-CA10-4A83-8593-3CC0EBDB2868 with serverName=muw1splmonpin02 and hostport=10.142.98.6:8089 already registered and UP)\n \n\n"

04-11-2017 10:35:38.136 +0000 INFO SHCSlave - readd + haveMinPeersJoined failed err=failed method=POST path=/services/shcluster/captain/members captain=10.142.98.6:8089 rc=0 actual_response_code=500 expected_response_code=201 status_line="Internal Server Error" transaction_error="\n \n \n In handler 'shclustercaptainmembers': Cannot add peer=10.142.98.7 mgmtport=8089 (reason: removeOldPeer peer=4FCA0F9D-CA10-4A83-8593-3CC0EBDB2868, serverName=muw1splmonpin03, hostport=10.142.98.7:8089, but found different peer=4FCA0F9D-CA10-4A83-8593-3CC0EBDB2868 with serverName=muw1splmonpin02 and hostport=10.142.98.6:8089 already registered and UP)\n \n\n" but proxy is connected. Either add-peer failed on captain, or we must be one of the early members joining the new captain

pradeepkumarg · ‎04-10-2017

Like @somesoni2 mentioned. You need atleast 3 members to form a cluster

Required number of instances
The cluster must contain at a minimum the number of members needed to fulfill both of these requirements:

Three members, so that the cluster can continue to function if one member goes down. See Captain election process has deployment implications.
The replication factor number of instances. See Choose the replication factor for the search head cluster.
For example, if your replication factor is either 2 or 3, you need at least three instances. If your replication factor is 5, you need at least five instances.

You can optionally add more members to boost search and user capacity.

http://docs.splunk.com/Documentation/Splunk/6.5.3/DistSearch/SHCsystemrequirements

splunker12er · ‎03-22-2018

I have 3 node SH cluster , I want to confirm if 2 of my nodes failed - does the cluster (with the remaining 1 node) can able to accept the new search requests ?

somesoni2 · ‎04-10-2017

The search cluster requires a minimun of 3 members. Do you have 3 nodes or you're trying with just two?

dhawal_sanghvi · ‎04-10-2017

I am using only 2 search head one as Captain and one as member.

Why is search head cluster not showing the search head members?

What's New in Splunk Enterprise 9.4: Features to Power Your Digital Resilience

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

SignalFlow: What? Why? How?