Knowledge Management

Indexer Rebalance Limits

dersonje2
Engager

Hello,

I'm not finding info on the limits within Splunk's data rebalancing. Some context, I have ~40 indexers and stood up 8 new ones. The 40 old ones had an avg of ~150k buckets each. At some point the rebalance reported that it was completed (above the .9 threshold) even though there were only ~40k buckets on the new indexers. When I kicked off a second rebalance, it started from 20% again and continued rebalancing because the new indexers were NOT space limited on the smartstore caches yet. The timeout was set to 11 hours and the first one finished in ~4. The master did not restart during this balancing.

Can anyone shed some more light on why the first rebalance died? Like, is there a 350k bucket limit per rebalance or something?

Labels (1)
0 Karma

HiramMann
Loves-to-Learn

When the cluster meets the minimum threshold (e.g., 0.9 balance), the rebalance process considers its job “done,” even if distribution across the newest indexers still isn’t as even as expected. That’s why the first rebalance stopped after ~4 hours, while the second one restarted from ~20% and continued moving more buckets. Essentially, Splunk rebalancing is designed to gradually optimize data distribution while minimizing cluster load, not necessarily to perfectly even out every run.

0 Karma

isoutamo
SplunkTrust
SplunkTrust

That's true. Then you must remember that rebalancing just count number of buckets when it does its work. Because buckets can have different sizes the disk space usage is not rebalanced just count of those.

In rebalancing there are two options:

  • rebalance primaries
  • rebalance buckets

1st one is done automatically in quite many situations e.g. rolling restart etc.

2nd one is always manual work which target it set to 90% level.

What I have done by myself is modify that %-level. Depending on environment I have used e.g. 95-99% to get better distribution of buckets over nodes. After you have gotten suitable level, you should adjust that % level back to 90%.

0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @dersonje2 

Splunk doesn’t impose a hard 350 K buckets per rebalance limit, it sounds like your first rebalance simply hit the default/configured rebalance-threshold and the master declared “good enough” and stopped. By default the master will stop moving buckets once within 90 % of the ideal distribution, with over 40 indexers before adding the new ones, I guess in theory that could mean a pretty small number of buckets will end up on the new indexers as it will have started at ~80% distribution.

If you want to make the spread more even then increase the rebalance_threshold within the [clustering] stanza in server.conf to a number closer to 1, where 1=100% distribution. This might improve the distribution you are getting.

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

 

0 Karma

dersonje2
Engager

Thanks for confirming there shouldn't be a limit. I agree the cluster master decided it was good enough but I don't understand how it could have hit an "ideal distribution" and then minutes later in another balancing run was able to recognize that another ~40k+ buckets per indexer needed to be moved to the same indexers again. It isn't too important because I just restarted the balancing runs until it was actually balanced, but it makes me wonder if this is the only bucket based operation that has gremlins. 

0 Karma
Get Updates on the Splunk Community!

Data Management Digest – November 2025

  Welcome to the inaugural edition of Data Management Digest! As your trusted partner in data innovation, the ...

Upcoming Webinar: Unmasking Insider Threats with Slunk Enterprise Security’s UEBA

Join us on Wed, Dec 10. at 10AM PST / 1PM EST for a live webinar and demo with Splunk experts! Discover how ...

.conf25 technical session recap of Observability for Gen AI: Monitoring LLM ...

If you’re unfamiliar, .conf is Splunk’s premier event where the Splunk community, customers, partners, and ...