Deployment Architecture

How do you balance disk usage across cluster peers after adding a new indexer?

Lowell
Super Champion

The Rebalance The Cluster documentation states:

 .. If you add a new peer that does not currently have any bucket copies, rebalancing itself will not cause the peer to gain any copies.

So my question is, what if you want to rebalance buckets across all the search peers? For example, say you have 4 search peers at 80% utilization and you want to bring up a 5th peer? (Splunk suggest keeping the disk utilization at 20% free.) Are you stuck with 4 indexers at 80% and one indexer at 0%? (Of course over time this will eventually correct itself, but that depends on the retention policy; this may work for a 90 day retention policy, but not if it's like a year.)

Is there a way to manually trigger bucket replication between two peers and then remove the copies from the original nodes? (Possibly using the Remove excess bucket copies process?)

rbal_splunk
Splunk Employee
Splunk Employee
0 Karma

lguinn2
Legend

This isn't great, but you can set things up so that more data flows to the new indexer than the old ones. Just be careful, so that 6 months from now you don't have all of your old data on the 5th indexer!

Indexer discovery lets the forwarders contact the cluster master to obtain a list of servers - and the cluster master can also tell the forwarders to "weight" their choice of indexers so that one indexer gets more of the inbound data than others...

http://docs.splunk.com/Documentation/Splunk/6.3.2/Indexer/useforwarders#Advantages_of_the_indexer_di...

http://docs.splunk.com/Documentation/Splunk/6.3.2/Indexer/indexerdiscovery

0 Karma

rbal_splunk
Splunk Employee
Splunk Employee

Enhancement Request has been creating requesting this functionality - SPL-76841:Allow Splunk admins to rebalance buckets after adding new indexers into the cluster

theunf
Communicator

Still on dirty solutions, if you take one old indexer at a time, the cluster will need to create new replicas and the new indexer should be selected.
After all stats turn green for the replication, put that indexer back on track and go to remove excess buckets.
Splunk will keep the buckets that are searchable in the new cluster instead of turn back the buckets of the indexer that were off.

I´ll also go with that suggestion of changing RF ... If you trust in your backup, reduce SF and RF to 1 on a maintenance window and run remove excess buckets. When it finishes, raise the RF and SF to your defaults. Splunk will try to distribute buckets almost the same between all indexers.

That´s something that Hadoop claims to do easily but still no clean way to do with Splunk.

0 Karma

yannK
Splunk Employee
Splunk Employee

I am not aware of a nice method to achieve it.
moving manually the buckets will be messy, and the cluster master will complain.

Dirty workaround :
Why not disable the splunktcp inputs on the old indexers, all will go to the new one.
When it's full enough, re-enable the splunktcp on the indexers.

0 Karma

Lowell
Super Champion

Sorry, I misread your original answer. Disable the inputs on the indexers with the highest disk utilization; which doesn't require a restart. Got it. Inbound s2s connections would be closed, and as long as indexer acknowledgements are enabled, no events should be lost and inbound event would be sent to the other indexers. The indexer with disabled splunktcp input would continue to receive replicated buckets, just not any new buckets of it's own, right? So that would reduce the disk growth on by about 50% or so (depending on if the replicated bucket are searchable or not.)

0 Karma

yannK
Splunk Employee
Splunk Employee

I assumed that your forwarder already had the loadbalancing enabled. You just need to disable the input on the spluntcp, UI or CLI or config + reload.

0 Karma

Lowell
Super Champion

It seems to me if the master can tell the peer node "hey, go copy bucket x to node y" there should be some way to manually trigger that same behavior. (But I'm guessing it's not in the REST API, or at least not documented.)

0 Karma

Lowell
Super Champion

Since we're talking about ugly workarounds... I thought about cranking up the replication factor temporarily, waiting for more buckets to get copied, then resetting the replication factor and removing excess buckets. Of course this has even more issues: (1) No guarantee where new buckets will end up, (2) No way to dictate which ones will be removed (the few bucket that do make it to the new box could be the ones the master chooses to remove), and (3) probably lots of restarts needed here too.

0 Karma

Lowell
Super Champion

I thought about that, but there are a few potential pitfalls. First, going spreading data to 4 indexers down to just 1 is pretty drastic. Secondly, changes to outputs.conf requires a restart. 😞

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...