We have a problem with the scheduler failing following a search head cluster (SHC) deployment, which is resolved only if we manually change the captain following the deployment. This is not an ideal solution, and we want to sort out the root cause.
Following last nights deployment, we saw the following sequence of events (mostly from the debug logs);
SHC Rolling Restart begins...All peers told to close down their searches in turn...Restarts complete normally with no error...
Then, Captain tells peers to remove artifacts "DEBUG SHCMaster - remove artifact aid=scheduler~" Most work fine, but two fail with the following errors;
"DEBUG SHCMaster - event=SHPMaster::asyncReplicationArtifact sid=154~ status=failed msg=sid is not an artifact but a remote search job "
"DEBUG SHCMaster - event=SHPMaster::asyncReplicationArtifact aid=154~ status=failed msg="Could not find artifact or sid"
From then on, the scheduler keeps repeating these errors and no scheduler searches, accelerations, alerts etc run until the captain is transferred.
Couldn't tell you if this is a symptom or cause. I can hazard a guess something went wrong with those searches, but what? And how do we stop it happening?
... View more
Given the option of compressing files using the above technologies which would be the best practise for use with Splunk Analytics for Hadoop (Hunk)?
From my own research I'm leaning towards snappy for searches, lzma for space with gzip and bzip2 in the middle space with bzip2 being generally better than gzip.
I have also seen others recommending LZO, although i'm not sure if this is an option for this specific case.
... View more