Deployment Architecture

SHC member nodes status flickering on the 'indexer clustering' page

rahul_bhatia
New Member

Hello,

We are running Splunk version 7.1.3.

We have 2 SHCs connected to our indexers. For one of the SHCs, the SHC members keep flickering between 'Up' and 'Down' status on the 'Indexer Clustering' page.

One of the previous posts suggested to increase 'generation_poll_interval' from 5 to 60 seconds. In our case, for members of both SHCs, 'generation_poll_interval' defaults to 5. The flickering status only happens for members of one SHC, and not the other.

Any further inputs on this behavior would be appreciated.

Thanks

Tags (2)
0 Karma
1 Solution

nickhills
Ultra Champion

You must be seeing errors in _internal for the SHC members which are at fault.
Can you post some of the messages you see?

If my comment helps, please give it a thumbs up!

View solution in original post

0 Karma

codebuilder
SplunkTrust
SplunkTrust

Based on the information you supplied, I suspect that you are running into a split-brain situation.
Search head clustering should include no fewer than 3 nodes.
The three nodes make a "decision" on who should be captain based on "votes".
When you have only two, it becomes nearly impossible for them to agree/elect the leader, (quorum) and will lead to the situation you describe.

----
An upvote would be appreciated and Accept Solution if it helps!
0 Karma

nickhills
Ultra Champion

I initially read it that way too, but i think the question means 2 seperate SH clusters of x nodes.
Given the minimums you corectly state, that means at least 6 search head members, split across 2 SHCs.
At least thats my assumption..

If my comment helps, please give it a thumbs up!
0 Karma

codebuilder
SplunkTrust
SplunkTrust

Yes, that is correct. Though it is technically possible to cluster two nodes, it is not good practice and leads to these type of issues. You need at least 3 nodes per SHC. Otherwise, you'll continue to have split-brain issues.

----
An upvote would be appreciated and Accept Solution if it helps!
0 Karma

codebuilder
SplunkTrust
SplunkTrust

For the record, split-brain is not unique to Splunk. You'll encounter it in any type of clustering with only two nodes. Two nodes can't establish quorum successfully (more often than not).

----
An upvote would be appreciated and Accept Solution if it helps!
0 Karma

nickhills
Ultra Champion

You must be seeing errors in _internal for the SHC members which are at fault.
Can you post some of the messages you see?

If my comment helps, please give it a thumbs up!
0 Karma

rahul_bhatia
New Member

Hi Nick,

So I am seeing the following message for one of the search peers:

ERROR DistributedPeerManagerHeartbeat - Status 502 while sending public key to cluster search peer
WARN DistributedPeerManagerHeartbeat - Send failure while pushing PK to search peer, Connect Timeout

Apparently, the SHC member nodes cannot connect to just this search peer on port 8089. It seems this is the culprit which is causing the fluctuations in the status.

I will get this rectified and see if this alleviates the problem.

Thanks!

0 Karma

nickhills
Ultra Champion

Sounds promising. Good luck

If my comment helps, please give it a thumbs up!
0 Karma

nickhills
Ultra Champion

If my answer helped, please consider accepting and/or upvoting so that other memebers of the community can see it was useful.

If my comment helps, please give it a thumbs up!
0 Karma

rahul_bhatia
New Member

As an update, there is a communication issue between the SHC nodes and just one indexer out of 46 that we have.

This seems to be causing the fluctuation in the status.

Thanks for your responses. This has been marked as 'Accepted'.

0 Karma
Get Updates on the Splunk Community!

Security Highlights: September 2022 Newsletter

 September 2022 The Splunk App for Fraud Analytics (SFA) is now Splunk SupportedUse your existing Splunk ...

Platform Highlights | September 2022 Newsletter

 September 2022 What’s New in 9.0 and How to UpgradeGet a walk through of what is new Splunk Enterprise 9.0 ...

Observability Highlights | September 2022 Newsletter

 September 2022 Splunk Observability SuiteAccess to "Classic" SignalFx Interface Will be Removed on Sept 30, ...