I have a three search head SHC.
I see that one SHC member going for restart but never comes back up. This is the log line.
INFO SHCSlave - event=SHPSlave::handleHeartbeatDone master has instructed peer to restart
SHC has three members with Dynamic captain.
What could be going wrong.
Essentially, directory permissions on /slave-apps/ on the search peer had been lost (why?) and directory was set to read only. As per the link above, resetting the permissions allowed the Cluster Master to once again populate the directory with the required apps.