Our index cluster has doubled over the last year. As we added members to the cluster we never looked at whether or not our current replication factor (3) is still sufficient for the number of indexers we have. We started two years ago with 8 indexers and now have 16, soon to be 20. Is there any documentation that shows what the best practice is for replication factor per indexer? I would assume 3 is still good; but wanted to challenge my assumption.
Setting replication factor depends on many factors. is yours a multi-site or are you running it as a single site cluster? how many sites involved in the cluster. What is your high-availability requirements? With a replication_factor=3 with 2 sites (origin:2, total:3) seems like working fine in many cases as it provides site level HA and this is just an example. Increasing replication factor also comes with the cost of storage. So you need to check and validate your requirements with the settings you have. For the detailed analysis on multi-site and how the replication_factor setting comes into an action, you can refer to the below link.
Replication factor is more driven by your business factors rather than performance point of view. Though the increase of your server footprint will have impact, but the key drivers are
- Do you have multiple sites (more than 2?). If yes, increasing replication factor would be good idea. (eg 4 sites, then replication of 4 may be better with each bucket at each site)
- Is the connectivity between your sites uses fast link ? if not, ensure you enable Search affinity to nearby site so that users don't need to search data on remote site
- Do you backup your data externally outside Splunk? If yes, reducing replication factor to 2 may be ideal (one bucket at each site, assuming you have 2 sites), so you can save Splunk cold-storage space and replication time.
Reading through your post, I still 3 is still good.