We recently deployed 5 new indexers into site 2 our 2-site clustered environment to replace 5 old ones in the same site (2). We have offlined the old indexers and I am now attempting to rebalance the cluster.
I will note that a large amount of bucket fixing activities are taking place currently, as the new indexers in site 2 are copying buckets from site 1 to reestablish data redundancy.
The problem is: When attempting to run a rebalance operation in the GUI from the cluster master, it will begin the rebalance successfully. A couple minutes to an hour go by while the completion % slowly climbs. This is demonstrated in splunkd.log:
06-23-2022 10:19:32.148 -0400 INFO CMMaster - data rebalance started, initial_work=900897 06-23-2022 10:19:32.148 -0400 INFO CMMaster - data rebalance completion percent=0.00 06-23-2022 10:20:02.534 -0400 INFO CMMaster - data rebalance completion percent=1.90 06-23-2022 10:20:32.893 -0400 INFO CMMaster - data rebalance completion percent=1.90 06-23-2022 09:51:49.099 -0400 INFO CMMaster - data rebalance completion percent=3.05 06-23-2022 09:52:21.558 -0400 INFO CMMaster - data rebalance completion percent=3.06
Then, seemingly at random, I get this error message in the logs, and the rebalance suddenly stops.
06-23-2022 10:04:58.657 -0400 INFO FixupStrategy - rebalance skipped all buckets, forcing a stop
06-23-2022 10:04:59.189 -0400 INFO CMMaster - data rebalance complete! percent=100.00
Searching the internet did not yield any results for this error message. does anyone know what could be causing my rebalance to skip all buckets?
... View more