I have * 21 indexers * in my Splunk environment running in index cluster mode. After upgrading the whole site from version 6.3.1 to version 6.5.1, I have the problem with ** replication data **. Invariably some machines fall Down and stay with Pending Status, and then these machines come back Up with Status Searchable. This process, so to speak, occurs several times.
Could someone tell me what that could be? I put two attachments to help.
You may want to consider going to 6.5.2 as there are two bugs that can impact a busy environment.
It is possible that either of these are causing some contention which results in a peer timing out while trying to communicate with the Cluster Master. The result of that scenario is a peer with a "Status" that is fluctuating.
If that doesn't correct the issue and you have more than 20-30K buckets per indexer, some timings may need to be adjusted but I would highly encourage you to upgrade first.
You'll need to keep indexers of the same cluster within one minor version of each other.
So, 6.4.X indexers will be okay with 6.5.x indexers, however 6.3.x and 6.5.x indexers will not be guaranteed (and indeed 6.3.x <-> 6.5.x+ replication is intentionally broken)