Deployment Architecture

SHC member nodes status flickering on the 'indexer clustering' page

rahul_bhatia
New Member

Hello,

We are running Splunk version 7.1.3.

We have 2 SHCs connected to our indexers. For one of the SHCs, the SHC members keep flickering between 'Up' and 'Down' status on the 'Indexer Clustering' page.

One of the previous posts suggested to increase 'generation_poll_interval' from 5 to 60 seconds. In our case, for members of both SHCs, 'generation_poll_interval' defaults to 5. The flickering status only happens for members of one SHC, and not the other.

Any further inputs on this behavior would be appreciated.

Thanks

Tags (2)
0 Karma
1 Solution

nickhills
Ultra Champion

You must be seeing errors in _internal for the SHC members which are at fault.
Can you post some of the messages you see?

If my comment helps, please give it a thumbs up!

View solution in original post

0 Karma

codebuilder
Motivator

Based on the information you supplied, I suspect that you are running into a split-brain situation.
Search head clustering should include no fewer than 3 nodes.
The three nodes make a "decision" on who should be captain based on "votes".
When you have only two, it becomes nearly impossible for them to agree/elect the leader, (quorum) and will lead to the situation you describe.

0 Karma

nickhills
Ultra Champion

I initially read it that way too, but i think the question means 2 seperate SH clusters of x nodes.
Given the minimums you corectly state, that means at least 6 search head members, split across 2 SHCs.
At least thats my assumption..

If my comment helps, please give it a thumbs up!
0 Karma

codebuilder
Motivator

Yes, that is correct. Though it is technically possible to cluster two nodes, it is not good practice and leads to these type of issues. You need at least 3 nodes per SHC. Otherwise, you'll continue to have split-brain issues.

0 Karma

codebuilder
Motivator

For the record, split-brain is not unique to Splunk. You'll encounter it in any type of clustering with only two nodes. Two nodes can't establish quorum successfully (more often than not).

0 Karma

nickhills
Ultra Champion

You must be seeing errors in _internal for the SHC members which are at fault.
Can you post some of the messages you see?

If my comment helps, please give it a thumbs up!

View solution in original post

0 Karma

rahul_bhatia
New Member

Hi Nick,

So I am seeing the following message for one of the search peers:

ERROR DistributedPeerManagerHeartbeat - Status 502 while sending public key to cluster search peer
WARN DistributedPeerManagerHeartbeat - Send failure while pushing PK to search peer, Connect Timeout

Apparently, the SHC member nodes cannot connect to just this search peer on port 8089. It seems this is the culprit which is causing the fluctuations in the status.

I will get this rectified and see if this alleviates the problem.

Thanks!

0 Karma

nickhills
Ultra Champion

Sounds promising. Good luck

If my comment helps, please give it a thumbs up!
0 Karma

nickhills
Ultra Champion

If my answer helped, please consider accepting and/or upvoting so that other memebers of the community can see it was useful.

If my comment helps, please give it a thumbs up!
0 Karma

rahul_bhatia
New Member

As an update, there is a communication issue between the SHC nodes and just one indexer out of 46 that we have.

This seems to be causing the fluctuation in the status.

Thanks for your responses. This has been marked as 'Accepted'.

0 Karma