Deployment Architecture

Why is our search head cluster scheduler failing following deployment or rolling restart?

duncangoff
Engager

We have a problem with the scheduler failing following a search head cluster (SHC) deployment, which is resolved only if we manually change the captain following the deployment. This is not an ideal solution, and we want to sort out the root cause.

Following last nights deployment, we saw the following sequence of events (mostly from the debug logs);

SHC Rolling Restart begins...All peers told to close down their searches in turn...Restarts complete normally with no error...

Then, Captain tells peers to remove artifacts "DEBUG SHCMaster - remove artifact aid=scheduler~" Most work fine, but two fail with the following errors;

"DEBUG SHCMaster - event=SHPMaster::asyncReplicationArtifact sid=154~ status=failed msg=sid is not an artifact but a remote search job "
"DEBUG SHCMaster - event=SHPMaster::asyncReplicationArtifact aid=154~ status=failed msg="Could not find artifact or sid"

From then on, the scheduler keeps repeating these errors and no scheduler searches, accelerations, alerts etc run until the captain is transferred.

Couldn't tell you if this is a symptom or cause. I can hazard a guess something went wrong with those searches, but what? And how do we stop it happening?

0 Karma

lakshman239
Influencer

Looks to me that following deployment/restart the captain election is not happing. have you tried clearing the RAFT status? Also, you would need to ensure the health of the KVstore across members is good. Also, look at the monitoring console for any issues from the SH members. https://docs.splunk.com/Documentation/Splunk/7.2.3/DistSearch/Handleraftissues

0 Karma

duncangoff
Engager

The Captain election happens fine with no issues, same for KV store

0 Karma
Get Updates on the Splunk Community!

🔐 Trust at Every Hop: How mTLS in Splunk Enterprise 10.0 Makes Security Simpler

From Idea to Implementation: Why Splunk Built mTLS into Splunk Enterprise 10.0  mTLS wasn’t just a checkbox ...

Observe and Secure All Apps with Splunk

  Join Us for Our Next Tech Talk: Observe and Secure All Apps with SplunkAs organizations continue to innovate ...

Splunk Decoded: Business Transactions vs Business IQ

It’s the morning of Black Friday, and your e-commerce site is handling 10x normal traffic. Orders are flowing, ...