We are building a Spunk Indexer Cluster in a cloud environment and want to be able to leverage the ability to scale as needed. So here is my question. Let's say I have 4 index peers with a replication factor of 3 and I introduce another peer to accommodate load. Three days later I no longer need this node so I want to scale down and remove that new node. Will the cluster rebalance the buckets to maintain the rep factor? Meaning if my new node happens to be home to part of a replicated index and I remove it (the node), will Spunk master rebalance that to another node therefore honoring the replication factor?
Another way to ask this question, more generally. What are the pitfalls of what we are trying to do? Who has tried and what was the outcome?
As always any input is welcome and that you!
@brent_weaver - Did the answer provided by skalliger help provide a solution to your question? If yes, please don't forget to resolve this post by clicking "Accept". If no, please leave a comment with more feedback. Thanks!
Hi there,
yes, the cluster will reblance itself whenever a peer gets added or restated (or the master itself). When a node goes down, the bucket fixing kicks in (https://docs.splunk.com/Splexicon:Bucketfixing).
To make sure you don't have too many copies of your data, you can show and remove the excessive bucket copies manually: https://docs.splunk.com/Documentation/Splunk/6.5.1/Indexer/Removeextrabucketcopies
Please don't forget to also consider the Search Factor, because a higher Replication Factor will only keep more unsearchable copies but you also might want to have more searchable copies of your data, depending on your search load.
Primary rebalancing kicks in whenever a peer joins or rejoins the cluster which will then distribute searchable copies of your data, for more information just see this: http://docs.splunk.com/Documentation/Splunk/6.5.1/Indexer/Rebalancethecluster
Talking of your last question, the downside of adding and removing peers is, that you cannot really affect the primary rebalancing. A new member may not have all the (searchable) copies as soon as you add it to your cluster. If I understood that correct. And replicating data takes time.
Any more questions?
Edit: typo