About joefuguet

joefuguet · ‎07-28-2022

Thanks Paul - I did end up running the rebalance again a couple weekends back with "searchable=no". Some of our indexes are smart store so maybe this was part of it, Worked fine this time. Best guess is there was some conflict between the auto bucket fixing from adding new indexers to the cluster while simultaneously trying to rebalance.

joefuguet · ‎07-28-2022

Thanks for this, to provide a delayed update I reran the rebalance a couple weeks back and it worked fine this time... I did run with "searchable=no" this time. Perhaps there was some conflict between the rebalance and the automatic bucket fixing that was taking place since I had just joined multiple indexers to our cluster.

joefuguet · ‎06-28-2022

Thank you Sir - Since my original message, all automatic bucket fixing has finished, so now the cluster is in a stable status. I will take your recommendation to perform a rolling restart this weekend and then attempt another rebalance, this time with searchable=yes.

joefuguet · ‎06-28-2022

Hi Isoutamo, Thanks for the response! Let me answer which questions I am able to from my last attempt to rebalance the cluster over the weekend. I will not be able to run another attempt until this weekend, but here's what I got: 1. Space is definitely sufficient on all nodes (upgraded to machines with double the storage capacity) 2. IDX cluster status was NOT ok at the time of the rebalance attempt. It was rebalancing primaries (edit: I mean, automatic bucket fixing was occurring due to adding new indexers) at the same time as my rebalance attempt. I feel that this was most likely the problem upon reflection - rebalancing the indexes during primary rebalance does not make much sense on my part. 3. This was done 4. As stated above these operations were occurring simultaneously 5. Can test upcoming weekend 6. This was left default (blank) 7. Unsure - if this corresponds to "Searchable" GUI option, this was NOT selected. 8. Tried with multiple thresholds, 0.9, 0.85, 0.8 with same result

joefuguet · ‎06-23-2022

We recently deployed 5 new indexers into site 2 our 2-site clustered environment to replace 5 old ones in the same site (2). We have offlined the old indexers and I am now attempting to rebalance the cluster. I will note that a large amount of bucket fixing activities are taking place currently, as the new indexers in site 2 are copying buckets from site 1 to reestablish data redundancy. The problem is: When attempting to run a rebalance operation in the GUI from the cluster master, it will begin the rebalance successfully. A couple minutes to an hour go by while the completion % slowly climbs. This is demonstrated in splunkd.log: 06-23-2022 10:19:32.148 -0400 INFO CMMaster - data rebalance started, initial_work=900897 06-23-2022 10:19:32.148 -0400 INFO CMMaster - data rebalance completion percent=0.00 06-23-2022 10:20:02.534 -0400 INFO CMMaster - data rebalance completion percent=1.90 06-23-2022 10:20:32.893 -0400 INFO CMMaster - data rebalance completion percent=1.90 06-23-2022 09:51:49.099 -0400 INFO CMMaster - data rebalance completion percent=3.05 06-23-2022 09:52:21.558 -0400 INFO CMMaster - data rebalance completion percent=3.06 Then, seemingly at random, I get this error message in the logs, and the rebalance suddenly stops. 06-23-2022 10:04:58.657 -0400 INFO FixupStrategy - rebalance skipped all buckets, forcing a stop 06-23-2022 10:04:59.189 -0400 INFO CMMaster - data rebalance complete! percent=100.00 Searching the internet did not yield any results for this error message. does anyone know what could be causing my rebalance to skip all buckets?

Posts	5
Solutions	0
Karma Given	4
Karma Received	0
Member Since	‎12-09-2021

Online Status	Offline
Date Last Visited	‎10-22-2022 02:18 PM

Why is Data Rebalance 'failing' after a couple min...

Re: Why is Data Rebalance 'failing' after a couple...

Re: Data Rebalance 'fails' after a couple minutes

Re: Data Rebalance 'fails' after a couple minutes

Re: Data Rebalance 'fails' after a couple minutes

Why is Data Rebalance 'failing' after a couple min...

Join the Conversation

Why is Data Rebalance 'failing' after a couple min...

Re: Why is Data Rebalance 'failing' after a couple...

Re: Data Rebalance 'fails' after a couple minutes

Re: Data Rebalance 'fails' after a couple minutes

Re: Data Rebalance 'fails' after a couple minutes

Why is Data Rebalance 'failing' after a couple min...