Deployment Architecture

Why is our search head cluster scheduler failing following deployment or rolling restart?

duncangoff
Engager

We have a problem with the scheduler failing following a search head cluster (SHC) deployment, which is resolved only if we manually change the captain following the deployment. This is not an ideal solution, and we want to sort out the root cause.

Following last nights deployment, we saw the following sequence of events (mostly from the debug logs);

SHC Rolling Restart begins...All peers told to close down their searches in turn...Restarts complete normally with no error...

Then, Captain tells peers to remove artifacts "DEBUG SHCMaster - remove artifact aid=scheduler~" Most work fine, but two fail with the following errors;

"DEBUG SHCMaster - event=SHPMaster::asyncReplicationArtifact sid=154~ status=failed msg=sid is not an artifact but a remote search job "
"DEBUG SHCMaster - event=SHPMaster::asyncReplicationArtifact aid=154~ status=failed msg="Could not find artifact or sid"

From then on, the scheduler keeps repeating these errors and no scheduler searches, accelerations, alerts etc run until the captain is transferred.

Couldn't tell you if this is a symptom or cause. I can hazard a guess something went wrong with those searches, but what? And how do we stop it happening?

0 Karma

lakshman239
Influencer

Looks to me that following deployment/restart the captain election is not happing. have you tried clearing the RAFT status? Also, you would need to ensure the health of the KVstore across members is good. Also, look at the monitoring console for any issues from the SH members. https://docs.splunk.com/Documentation/Splunk/7.2.3/DistSearch/Handleraftissues

0 Karma

duncangoff
Engager

The Captain election happens fine with no issues, same for KV store

0 Karma
Get Updates on the Splunk Community!

What's New in Splunk Cloud Platform 9.2.2403?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.2.2403! Analysts can ...

Stay Connected: Your Guide to July and August Tech Talks, Office Hours, and Webinars!

Dive into our sizzling summer lineup for July and August Community Office Hours and Tech Talks. Scroll down to ...

Edge Processor Scaling, Energy & Manufacturing Use Cases, and More New Articles on ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...