I have a case where some indexers take 4 to 5 hours to join the cluster. The system shows no/little system usage (CPU, Mem, I/O).
The splunkd.log appears to loop through the same log entries multiple times. Then, the indexer continues loading when I see a log entry: Running job=BundleForcefulStateMachineResetJob
After this reset job is ran I quickly see the public key for the master loaded and the indexer joins the cluster shortly thereafter.
Here is a snippet of the log:
10-13-2022 11:22:02.293 -0700 WARN HttpListener - Socket error from 127.0.0.1:54240 while accessing /servicesNS/splunk-system-user/splunk_archiver/search/jobs: Broken pipe 10-13-2022 11:44:08.721 -0700 INFO PipelineComponent - MetricsManager:probeandreport() took longer than seems reasonable (1103256 milliseconds) in callbackRunnerThread. Might indicate hardware or splunk limitations. 10-13-2022 11:44:24.950 -0700 INFO PipelineComponent - CallbackRunnerThread is unusually busy, this may cause service delays: time_ms=1119484 new=0 null=0 total=56 {'name':'DistributedRestCallerCallback','valid':'1','null':'0','last':'3','time_ms':'0'},{'name':'HTTPAuthManager:timeoutCallback','valid':'1','null':'0','last':'1','time_ms':'0'},{'name':'IndexProcessor:ipCallback-0','valid':'1','null':'0','last':'6','time_ms':'4000'},{'name':'IndexProcessor:ipCallback-1','valid':'1','null':'0','last':'19','time_ms':'4000'},{'name':'IndexProcessor:ipCallback-2','valid':'1','null':'0','last':'30','time_ms':'4062'},{'name':'IndexProcessor:ipCallback-3','valid':'1','null':'0','last':'41','time_ms':'4164'},{'name':'MetricsManager:probeandreport','valid':'1','null':'0','last':'0','time_ms':'1103256'},{'name':'PullBasedPubSubSvr:timerCallback','valid':'1','null':'0','last':'2','time_ms':'0'},{'name':'ThreadedOutputProcessor:timerCallback','valid':'4','null':'0','last':'40','time_ms':'0'},{'name':'triggerCollection','valid':'44','null':'0','last':'55','time_ms':'0'} 10-13-2022 12:00:00.001 -0700 INFO ExecProcessor - setting reschedule_ms=3599999, for command=/opt/splunk/bin/python3.7 /opt/splunk/etc/apps/splunk_instrumentation/bin/instrumentation.py 10-13-2022 12:18:32.106 -0700 WARN DispatchReaper - Failed to read search info for id=1665688686.28 10-13-2022 12:19:02.105 -0700 WARN DispatchReaper - Failed to read search info for id=1665688686.28 10-13-2022 12:19:32.106 -0700 WARN DispatchReaper - Failed to read search info for id=1665688686.28 10-13-2022 12:20:02.105 -0700 WARN DispatchReaper - Failed to read search info for id=1665688686.28 10-13-2022 12:20:30.137 -0700 WARN HttpListener - Socket error from 127.0.0.1:54544 while accessing /servicesNS/splunk-system-user/splunk_archiver/search/jobs: Broken pipe 10-13-2022 12:29:09.955 -0700 INFO PipelineComponent - MetricsManager:probeandreport() took longer than seems reasonable (2182584 milliseconds) in callbackRunnerThread. Might indicate hardware or splunk limitations. 10-13-2022 12:29:25.957 -0700 INFO PipelineComponent - CallbackRunnerThread is unusually busy, this may cause service delays: time_ms=2198585 new=1 null=0 total=57 {'name':'DistributedRestCallerCallback','valid':'1','null':'0','last':'3','time_ms':'0'},{'name':'HTTPAuthManager:timeoutCallback','valid':'1','null':'0','last':'1','time_ms':'0'},{'name':'IndexProcessor:ipCallback-0','valid':'1','null':'0','last':'6','time_ms':'4000'},{'name':'IndexProcessor:ipCallback-1','valid':'1','null':'0','last':'19','time_ms':'4000'},{'name':'IndexProcessor:ipCallback-2','valid':'1','null':'0','last':'30','time_ms':'4000'},{'name':'IndexProcessor:ipCallback-3','valid':'1','null':'0','last':'41','time_ms':'4000'},{'name':'MetricsManager:probeandreport','valid':'1','null':'0','last':'0','time_ms':'2182584'},{'name':'PullBasedPubSubSvr:timerCallback','valid':'1','null':'0','last':'2','time_ms':'0'},{'name':'ThreadedOutputProcessor:timerCallback','valid':'4','null':'0','last':'40','time_ms':'0'},{'name':'triggerCollection','valid':'45','null':'0','last':'56','time_ms':'0'} 10-13-2022 12:46:13.298 -0700 INFO PipelineComponent - MetricsManager:probeandreport() took longer than seems reasonable (496854 milliseconds) in callbackRunnerThread. Might indicate hardware or splunk limitations. 10-13-2022 12:46:13.867 -0700 WARN HttpListener - Socket error from 127.0.0.1:54220 while accessing /services/data/indexes: Broken pipe 10-13-2022 12:46:13.907 -0700 WARN HttpListener - Socket error from 127.0.0.1:54254 while accessing /services/data/indexes: Broken pipe 10-13-2022 12:46:13.931 -0700 WARN HttpListener - Socket error from 127.0.0.1:54560 while accessing /services/data/indexes: Broken pipe 10-13-2022 12:46:13.955 -0700 WARN HttpListener - Socket error from 127.0.0.1:54538 while accessing /services/data/indexes: Broken pipe 10-13-2022 12:46:17.070 -0700 INFO BundleJob - Running job=BundleForcefulStateMachineResetJob
... View more