we are having 3 search head and they are in cluster. splunkd process went down in two search head. when i checked the master search head i see below mentioned error. i restarted splunkd issue got resolved but need to understand why this was happened as all scheduled searches and reports were failed during that duration.
WARN TcpOutputFd - Connect to :8999 failed. Connection refused
ERROR TcpOutputFd - Connection to host=xxxx:8999 failed
WARN ArtifactReplicator - Connection failed
If 2 out of 3 SH nodes were down, the your cluster was not "working". So the SH Captain was not able to reach the other Search Heads which can cause connection refused errors.
Did you have the error after starting the 2 nodes that were down?
You can see your cluster's running status by running this command on any SH cluster member.
$SPLUNK_HOME/bin/splunk show shcluster-status
Do you have
captain_is_adhoc_searchhead = true in your server.conf? If so, then only the non-captain search heads will run scheduled searches.