We have a Splunk distributed cluster setup with 3 indexers, 3 search heads, 1 cluster master. The clusters were healthy when the cluster setup was done.
However, after switching the log traffic to Splunk indexer, we see one of the search head detached from the cluster and keeps restarting.
Below is the error I see from the splunkd.log for the search head that is having problem.
01-11-2022 17:39:57.495 +0000 INFO MetricSchemaProcessor [1424 typing] - channel confkey=source::/opt/splunk/var/log/introspection/disk_objects.log|host::splunk-shc-search-head-2|splunk_intro_disk_objects|CLONE_CHANNEL has an event with no measure, will be skipped.
01-11-2022 17:39:57.751 +0000 INFO IndexProcessor [1173 MainThread] - handleSignal : Disabling streaming searches.
01-11-2022 17:39:57.752 +0000 INFO IndexProcessor [1173 MainThread] - request state change from=RUN to=SHUTDOWN_SIGNALED
01-11-2022 17:39:57.752 +0000 INFO SHClusterMgr [1173 MainThread] - Starting to Signal shutdown RAFT
01-11-2022 17:39:57.752 +0000 INFO SHCRaftConsensus [1173 MainThread] - Shutdown signal received.
01-11-2022 17:39:57.752 +0000 INFO SHClusterMgr [1173 MainThread] - Signal shutdown RAFT completed
01-11-2022 17:39:57.752 +0000 INFO UiHttpListener [1173 MainThread] - Shutting down webui
01-11-2022 17:39:57.752 +0000 INFO UiHttpListener [1173 MainThread] - Shutting down webui completed
Any insights on what is causing this?
try restarting all the search heads 1 by 1, looks like it's unable to get in sync with all.
and after this check, if the captain is being assigned and this search head took part in the election.
Thanks. I restarted multiple times and it doesn't sync-up. As it is running in production, can't wait and had to delete the search head cluster and re-created it. It worked fine after. Still no clue why we had this problem for the first time.
i haven’t been in this kind of situation, but in other cases remove that node from SHC and/or resync it from others has usually solved the situation. No need for recreate the whole SHC.