The sequence seems to be that the captain delegates the search to one search head, but suffers the error indicated by "status=delegated_remote_error" in the first log message in the original post. Then, understandably, it delegates the search to a different search head. But the first search head also runs the search.
I have seen the same problem, and when I looked at splunkd.log on the search head that the captain said "status=delegated_remote_error" for, I saw these errors:
06-29-2016 10:04:01.014 -0400 WARN ISplunkDispatch - sid:scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 Gave up waiting for the captain to establish a common bundle version across all search peers; using most recent bundles on all peers instead
06-29-2016 10:04:55.643 -0400 WARN SHPMasterHTTPProxy - Low Level http request failure err=failed method=POST path=/services/shcluster/captain/jobs/scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6/report_job_completion captain=xsplunkm3d:8089 rc=0 actual_response_code=500 expected_response_code=200 status_line=Internal Server Error error="<response>\n <messages>\n <msg type="ERROR">\n In handler 'shclustercaptainjobs': job completion failed peer=6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 sid=scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 (reason: event=SHPMaster::handleDelegatedJobCompletion peer=6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 job=scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 did not exist in map)</msg>\n </messages>\n</response>\n"
06-29-2016 10:04:55.643 -0400 WARN SHPSlave - event=SHPSlave::delegatedJobCompletion Failed to notify captain of job completion job=scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6. reason=failed method=POST path=/services/shcluster/captain/jobs/scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6/report_job_completion captain=xsplunkm3d:8089 rc=0 actual_response_code=500 expected_response_code=200 status_line=Internal Server Error error="<response>\n <messages>\n <msg type="ERROR">\n In handler 'shclustercaptainjobs': job completion failed peer=6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 sid=scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 (reason: event=SHPMaster::handleDelegatedJobCompletion peer=6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 job=scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 did not exist in map)</msg>\n </messages>\n</response>\n"
Does this give any clues to someone who knows the inner workings of this to say what's going wrong?
... View more