Deployment Architecture

Why is our search head cluster scheduler failing following deployment or rolling restart?

duncangoff
Engager

We have a problem with the scheduler failing following a search head cluster (SHC) deployment, which is resolved only if we manually change the captain following the deployment. This is not an ideal solution, and we want to sort out the root cause.

Following last nights deployment, we saw the following sequence of events (mostly from the debug logs);

SHC Rolling Restart begins...All peers told to close down their searches in turn...Restarts complete normally with no error...

Then, Captain tells peers to remove artifacts "DEBUG SHCMaster - remove artifact aid=scheduler~" Most work fine, but two fail with the following errors;

"DEBUG SHCMaster - event=SHPMaster::asyncReplicationArtifact sid=154~ status=failed msg=sid is not an artifact but a remote search job "
"DEBUG SHCMaster - event=SHPMaster::asyncReplicationArtifact aid=154~ status=failed msg="Could not find artifact or sid"

From then on, the scheduler keeps repeating these errors and no scheduler searches, accelerations, alerts etc run until the captain is transferred.

Couldn't tell you if this is a symptom or cause. I can hazard a guess something went wrong with those searches, but what? And how do we stop it happening?

0 Karma

lakshman239
Influencer

Looks to me that following deployment/restart the captain election is not happing. have you tried clearing the RAFT status? Also, you would need to ensure the health of the KVstore across members is good. Also, look at the monitoring console for any issues from the SH members. https://docs.splunk.com/Documentation/Splunk/7.2.3/DistSearch/Handleraftissues

0 Karma

duncangoff
Engager

The Captain election happens fine with no issues, same for KV store

0 Karma
Get Updates on the Splunk Community!

Celebrate CX Day with Splunk: Take our interactive quiz, join our LinkedIn Live ...

Today and every day, Splunk celebrates the importance of customer experience throughout our product, ...

How to Get Started with Splunk Data Management Pipeline Builders (Edge Processor & ...

If you want to gain full control over your growing data volumes, check out Splunk’s Data Management pipeline ...

Out of the Box to Up And Running - Streamlined Observability for Your Cloud ...

  Tech Talk Streamlined Observability for Your Cloud Environment Register    Out of the Box to Up And Running ...