The Rebalance The Cluster documentation states:
.. If you add a new peer that does not currently have any bucket copies, rebalancing itself will not cause the peer to gain any copies.
So my question is, what if you want to rebalance buckets across all the search peers? For example, say you have 4 search peers at 80% utilization and you want to bring up a 5th peer? (Splunk suggest keeping the disk utilization at 20% free.) Are you stuck with 4 indexers at 80% and one indexer at 0%? (Of course over time this will eventually correct itself, but that depends on the retention policy; this may work for a 90 day retention policy, but not if it's like a year.)
Is there a way to manually trigger bucket replication between two peers and then remove the copies from the original nodes? (Possibly using the Remove excess bucket copies process?)
Interested parties may need to refer
This isn't great, but you can set things up so that more data flows to the new indexer than the old ones. Just be careful, so that 6 months from now you don't have all of your old data on the 5th indexer!
Indexer discovery lets the forwarders contact the cluster master to obtain a list of servers - and the cluster master can also tell the forwarders to "weight" their choice of indexers so that one indexer gets more of the inbound data than others...
http://docs.splunk.com/Documentation/Splunk/6.3.2/Indexer/indexerdiscovery
Enhancement Request has been creating requesting this functionality - SPL-76841:Allow Splunk admins to rebalance buckets after adding new indexers into the cluster
Still on dirty solutions, if you take one old indexer at a time, the cluster will need to create new replicas and the new indexer should be selected.
After all stats turn green for the replication, put that indexer back on track and go to remove excess buckets.
Splunk will keep the buckets that are searchable in the new cluster instead of turn back the buckets of the indexer that were off.
I´ll also go with that suggestion of changing RF ... If you trust in your backup, reduce SF and RF to 1 on a maintenance window and run remove excess buckets. When it finishes, raise the RF and SF to your defaults. Splunk will try to distribute buckets almost the same between all indexers.
That´s something that Hadoop claims to do easily but still no clean way to do with Splunk.
I am not aware of a nice method to achieve it.
moving manually the buckets will be messy, and the cluster master will complain.
Dirty workaround :
Why not disable the splunktcp inputs on the old indexers, all will go to the new one.
When it's full enough, re-enable the splunktcp on the indexers.
Sorry, I misread your original answer. Disable the inputs on the indexers with the highest disk utilization; which doesn't require a restart. Got it. Inbound s2s connections would be closed, and as long as indexer acknowledgements are enabled, no events should be lost and inbound event would be sent to the other indexers. The indexer with disabled splunktcp input would continue to receive replicated buckets, just not any new buckets of it's own, right? So that would reduce the disk growth on by about 50% or so (depending on if the replicated bucket are searchable or not.)
I assumed that your forwarder already had the loadbalancing enabled. You just need to disable the input on the spluntcp, UI or CLI or config + reload.
It seems to me if the master can tell the peer node "hey, go copy bucket x to node y" there should be some way to manually trigger that same behavior. (But I'm guessing it's not in the REST API, or at least not documented.)
Since we're talking about ugly workarounds... I thought about cranking up the replication factor temporarily, waiting for more buckets to get copied, then resetting the replication factor and removing excess buckets. Of course this has even more issues: (1) No guarantee where new buckets will end up, (2) No way to dictate which ones will be removed (the few bucket that do make it to the new box could be the ones the master chooses to remove), and (3) probably lots of restarts needed here too.
I thought about that, but there are a few potential pitfalls. First, going spreading data to 4 indexers down to just 1 is pretty drastic. Secondly, changes to outputs.conf
requires a restart. 😞