I have a 2 peer single site (sf2, rf2) index cluster. We recognized that the primaries for indexes are not distributed even by using the search:
| rest splunk_server=local /services/cluster/master/buckets
| rex field=title "^(?<repl_index>[^\~]+)" | search repl_index="*" standalone=0 frozen=*
| rename title AS bucketID | fields bucketID peers.*.search_state peers.*.bucket_flags frozen repl_index
| rename peers.3DAB62DE-6D21-4C93-B8E5-A65370709B79.bucket_flags as bucketflags
| eval prim=if(bucketflags = "0x0","prim_yes","prim_no")
| stats count by repl_index prim
| xyseries repl_index prim count
| fillnull prim_yes,prim_no
| eval ratio=prim_yes/(prim_yes+prim_no)
| eval ratio=round(ratio*100,2)
| search repl_index="*"
More or less all primaries are either on one indexer or the other, resulting in uneven load as we have a search hotspot on one index.
We were able to have a far better distribution after we set sf=1, removed excess buckets and set sf=2 again.
Unfortunatly after stop an indexer for a while or do a rolling restart it's again very uneven distributed (as seen on the first screenshot).
it's also possible to get an even distribution when stopping clustermaster and peers at the same time and starting again - in this time we have data loss. restarting any component for it's own doesn't fix the issue.
we tried to rebalance primaries using:
curl -k -u admin:plaseentercreditcardnumber --request POST https://localhost:8089/services/cluster/master/control/control/rebalance_primaries