I have * 21 indexers * in my Splunk environment running in index cluster mode. After upgrading the whole site from version 6.3.1 to version 6.5.1, I have the problem with ** replication data **. Invariably some machines fall Down and stay with Pending Status, and then these machines come back Up with Status Searchable. This process, so to speak, occurs several times.
Could someone tell me what that could be? I put two attachments to help.
SPL-134427, SPL-133450: 6.5+ splunk does full bundle replication everytime - slowing down the system
SPL-131398, SPL-132804, SPL-132805, SPL-132807, SPL-132890: Search head cluster contention on Linux due to poor hashing inside OpenSSL's error container.
It is possible that either of these are causing some contention which results in a peer timing out while trying to communicate with the Cluster Master. The result of that scenario is a peer with a "Status" that is fluctuating.
If that doesn't correct the issue and you have more than 20-30K buckets per indexer, some timings may need to be adjusted but I would highly encourage you to upgrade first.