Deployment Architecture

Which works best in a SHC? Even or Odd number of search heads to avoid the SHC Service becoming not available?

rholm01
Explorer

I had a multi-site SHC with 1 SH on one end and 3 SHs on the other end. Documents I find recommends an odd number at each site. One of my 3 servers was marked "down" by the Deployer but the SHC Service was still working. The failing server was rebuilt with a backup that was taken before the server was configured as part of the SHC. So, when the server came up it was no longer configured for the cluster and SHC Service broke. The single server at my site1 was configured to be the captain, so I am trying to figure out why this failed. What would rebooting splunk on the Deployer and/or on the search heads cause the SHC Service to fail if the server running the captain is not having any issues? And what can be done to prevent this from happening in the future?

Tags (1)
0 Karma

codebuilder
SplunkTrust
SplunkTrust

In any clustered environment, Splunk or otherwise, you must have an odd number of cluster members in order to prevent split-brain situations.

----
An upvote would be appreciated and Accept Solution if it helps!
0 Karma

rholm01
Explorer

The SHC Service is looking for 3 servers of which one was broken. Search head replication for the remaining two were find and search was not impacted.

splunk > splunk apply shcluster-bundle --answer-yes -target https://MY_SH2.mydomain.com:8089 -auth admin:XXXXXX

Error when issuing rolling restart on the master: Internal Server Error{"messages":[{"type":"ERROR","text":"Rolling restarted cannot be started without service_ready_flag = 1, check status through \"splunk show shcluster-status\". Reason :Waiting for 3 peers to register. (Number registered so far: 2)"}]}

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!