Our master node is still losing sight of indexer nodes. We had a problem similar to this that was affecting up to half of our cluster at a time. Currently, it is affecting 1 or 2 indexers at a time. Master reports the indexer as down, but the indexer is in fact up. This flapping of the
In total you have 20 indexer, where each indexer has around 25K buckets . SO in total you have 20*25,000= 500,000 bucket.
User the Searches here to see the Cluster Master Service Queue Progress: These are to be run on Cluster Master and remember to change earliest and latest time as per the need or the time you are trying to review.
Worked with Splunk Support to resolve this issue, following steps were recommended.
So there are lot of buckets for cluster master to manage. Normally when the count of number of bucket goes high cluster master has to do lot of processing to stay in compliance (for RF and SF). Splunk works best with fewer but large buckets.