We have new SH node which we are trying to add to the Search head cluster, updated the configs in shcluster config and other configs.
After adding this node in the cluster , now we have two nodes as pert of the SH cluster.
We can see both the nodes up and running part of the cluster, when we check it with "splunk show shcluster-status".
But, when we check the kvstore status with " splunk show kvstore-status" old nodes shows as captain , but the newly built node is not joining this cluster and giving the below error in the logs.
Error in Splunkd log on the search head which has issue..
12-04-2024 16:36:45.402 +0000 ERROR KVStoreBulletinBoardManager [534432 KVStoreConfigurationThread] - Local KV Store has replication issues. See introspection data and mongod.log for details. Cluster has not been configured on this member. KVStore cluster has not been configured
We have configured all the cluster related info on the newly built search head server(server.conf), dont see any configs missing.
We also see below error on the SH ui page messages tab..
Failed to synchronize configuration with KVStore cluster. Quorum check failed because not enough voting nodes responded; required 2 but only the following 1 voting nodes responded: search-head01:8191; the following nodes did not respond affirmatively: search-head01:8191 failed with Error connecting to search-head01:8191 (172.**.***.**:8191) :: caused by :: compression disabled.
Anyone else faced this error before...need some support here...
A cluster requires 3 or more (odd counts only). Quorum is obtained by having 50%+1 in sync. Having only 2 nodes means there will never be quorum.
@dural_yyzClose but not quite.
SHC uses raft algorithm. It will work with just two nodes but won't handle an outage of any node.
True, it needs quorum to elect a leader but a quorum can be obtained in a 2-node cluster by having votes of both nodes. The problem starts when one node is down because with just one alive node you can never get a quorum.
The same is also true for any even number of nodes - it needs (N/2)+1 votes for quorum so while an even-node cluster can survive (N/2)-1 nodes outage, it cannot function if you have an even split like half of the nodes in one datacenter, another half in another and a network outage. So odd-noded clusters are simply more cost-effective because adding one more node to make a cluster even-noded doesn't increase resilience.
Additionally, with Splunk's SHC you can simply enforce a manually set captain, bypassing the normal raft election.
@HarishSamudralaActually SHC consists of two "separate" clusters - one is your normal cluster formed of splunkd processes, another one is a "hidden" cluster formed of mongodb (kvstore) instances. Typically they share captaincy but it's not a must. In your case it seems that due to some communication problems the kvstore cluster can't get the nodes to communicate with each other so you can't get them both to form a quorum and decide which one is a captain.