We have an issue where all the scheduled searches are getting skipped whenever rolling restart is in progress.
Also, since few weeks, we have observed that the cluster master automatically initiates a rolling restart of the indexers, twice in a week. It takes about 24 hours to restart all the 24 indexers in a cluster, which impacts our business too.
Has anyone ever incurred this situation before?
Master should _not_ initiate restart out of thin air. There must be something that triggers it.
The timeouts and wait times can impact whether your searches get skipped.
Thanks for pointing me out to a proper direction!
Here's our config from [clustering] stanza:
[clustering]
mode = master
multisite = true
available_sites = site1, site2
site_replication_factor = origin:1, site1:1, site2:1, total:3
site_search_factor = origin:1, site1:1, site2:1, total:2
cluster_label = cluster1
maintenance_mode = false
max_peers_to_download_bundle = 10
service_interval = 10
heartbeat_timeout = 1800
cxn_timeout = 300
send_timeout = 300
rcv_timeout = 300
max_peer_build_load = 5
rolling_restart = searchable
restart_timeout = 500
decommission_force_timeout = 900
restart_inactivity_timeout = 1500
rebalance_threshold = 0.96
max_auto_service_interval = 250
I suspect few things here like "rolling_restart" should be "searchble_force" and max_peers_to_download_bundle would be more than 10? considering we have 24 indexers.
I will go through all these parameters and understand it in detail.
Do you suspect anything unusual in the configuration here? It would be much helpful! Thanks!