How do I figure out what the real problem is with ...

duke_splunk_adm · ‎02-10-2020

After several years the replication factor on my 6.6.3 index cluster recently changed to 'not met'. it has been fine in the past, and I can see buckets replicating among the 4 members of the cluster. I'm able to see open connections on :9887 among the members, they show up in each others' splunkd.log as successful replications, and nothing has changed configuration-wise or even version-wise. I've got two members of the cluster with 200G less room than the other two (each), and I cannot find anything that can help me figure out what the problem is.
The monitoring console says 'Not Met', the show cluster-config says 'not met', and the --verbose output on the master looks like this for every entry, with the Replicated copies and Searchable copies trackers showing the same numbers all across the board:

network_wireless_aps
         Number of non-site aware buckets=0
         Number of buckets=28
         Size=366897499
         Searchable  YES
         Replicated copies tracker
                28/28                   28/28
         Searchable copies tracker
                28/28                   28/28

 network_wireless_controllers
         Number of non-site aware buckets=0
         Number of buckets=35
         Size=19069528111
         Searchable  YES
         Replicated copies tracker
                35/35                   35/35
         Searchable copies tracker
                35/35                   35/35

(the same is true for the search-factor, but I figure if one gets better the other will too)

Lucas_K · ‎02-10-2020

On your cluster master gui click the indexes tab and click bucket status. Have "select index" on "all".

Any "in progress" or "pending" jobs?

If so have a look at the time in fix up and the current status. This will give a hint as to why your rep/search factors are not met.

duke_splunk_adm · ‎02-11-2020

I've been checking by comparing byte size of the various directories in $SPLUNK_HOME/var/lib/splunk and they are wildly different across the four machines.
After that rebalance command, however, they are more out of balance than before.
I'm really stumped, and the disk volume on two of the four boxes is approaching 95% full, enough to trigger alarms. The other two are holding pretty steady at 88% full.

duke_splunk_adm · ‎02-11-2020

Something must have worked its way out eventually with that rebalance command, because an hour after it finished all 4 indexers have a lower disk utilization. I think I've fixed it even though I don't know what happened.

duke_splunk_adm · ‎02-11-2020

The bit I forgot to say is that it's been like this for a week now. That's a bit long, even for 200G worth of buckets, yes?
I haven't ever enabled the web console on the master, so I found a couple of CLI commands to try to look for pending jobs and there weren't any that I could find.

I did just kick off a 'rebalance cluster-data', though, I hadn't tried that yet. We'll see what happens.

How do I figure out what the real problem is with my index cluster?

Accelerating Observability as Code with the Splunk AI Assistant

Integrating Splunk Search API and Quarto to Create Reproducible Investigation ...

Congratulations to the 2025-2026 SplunkTrust!

Join the Conversation

How do I figure out what the real problem is with my index cluster?

Accelerating Observability as Code with the Splunk AI Assistant

Integrating Splunk Search API and Quarto to Create Reproducible Investigation ...

Congratulations to the 2025-2026 SplunkTrust!