I'm receiving a duplicate alert for 1 given scheduled job. It appears to be scheduled twice.
I see a delegated_remote_error and then it gets scheduled twice. Why is this happening?
05-20-2015 04:00:27.427 -0700 INFO SavedSplunker - user="me", app="myapp", savedsearch_name="My Report", status=delegated_remote_error, scheduled_time=1432119600, peer_guid=ADDF764A-CCDF-478E-BF74-EE80F3A905EC
05-20-2015 04:00:31.015 -0700 INFO SavedSplunker - user="me", app="myapp", savedsearch_name="My Report", status=delegated_remote, scheduled_time=1432119600, peer_guid=C0F9897B-3E7D-407E
-AF7C-573164F3EC04, sid = scheduler__me_UFZPLU9uc2FsZQ__RMD5d867250ff1e04aea_at_1432119600_54459_C0F9897B-3E7D-407E-AF7C-573164F3EC04
05-20-2015 04:00:55.352 -0700 INFO SavedSplunker - user="me", app="myapp", savedsearch_name="My Report", status=delegated_remote, scheduled_time=1432119600, peer_guid=ADDF764A-CCDF-478E-BF74-EE80F3A905EC, sid = scheduler__me_UFZPLU9uc2FsZQ__RMD5d867250ff1e04aea_at_1432119600_62152_ADDF764A-CCDF-478E-BF74-EE80F3A905EC
05-20-2015 05:11:11.858 -0700 INFO SavedSplunker - user="me", app="myapp", savedsearch_name="My Report", status=delegated_remote_completion, scheduled_time=1432119600, peer_guid=C0F9897B-3E7D-407E-AF7C-573164F3EC04, sid = scheduler__me_UFZPLU9uc2FsZQ__RMD5d867250ff1e04aea_at_1432119600_54459_C0F9897B-3E7D-407E-AF7C-573164F3EC04 success=1
05-20-2015 05:11:11.878 -0700 INFO SavedSplunker - user="me", app="myapp", savedsearch_name="My Report", status=delegated_remote_completion, scheduled_time=1432119600, peer_guid=ADDF764A-CCDF-478E-BF74-EE80F3A905EC, sid = scheduler__me_UFZPLU9uc2FsZQ__RMD5d867250ff1e04aea_at_1432119600_62152_ADDF764A-CCDF-478E-BF74-EE80F3A905EC success=1
The sequence seems to be that the captain delegates the search to one search head, but suffers the error indicated by "status=delegated_remote_error" in the first log message in the original post. Then, understandably, it delegates the search to a different search head. But the first search head also runs the search.
I have seen the same problem, and when I looked at splunkd.log on the search head that the captain said "status=delegated_remote_error" for, I saw these errors:
06-29-2016 10:04:01.014 -0400 WARN ISplunkDispatch - sid:scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 Gave up waiting for the captain to establish a common bundle version across all search peers; using most recent bundles on all peers instead
06-29-2016 10:04:55.643 -0400 WARN SHPMasterHTTPProxy - Low Level http request failure err=failed method=POST path=/services/shcluster/captain/jobs/scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6/report_job_completion captain=xsplunkm3d:8089 rc=0 actual_response_code=500 expected_response_code=200 status_line=Internal Server Error error="<response>\n <messages>\n <msg type="ERROR">\n In handler 'shclustercaptainjobs': job completion failed peer=6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 sid=scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 (reason: event=SHPMaster::handleDelegatedJobCompletion peer=6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 job=scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 did not exist in map)</msg>\n </messages>\n</response>\n"
06-29-2016 10:04:55.643 -0400 WARN SHPSlave - event=SHPSlave::delegatedJobCompletion Failed to notify captain of job completion job=scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6. reason=failed method=POST path=/services/shcluster/captain/jobs/scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6/report_job_completion captain=xsplunkm3d:8089 rc=0 actual_response_code=500 expected_response_code=200 status_line=Internal Server Error error="<response>\n <messages>\n <msg type="ERROR">\n In handler 'shclustercaptainjobs': job completion failed peer=6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 sid=scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 (reason: event=SHPMaster::handleDelegatedJobCompletion peer=6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 job=scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 did not exist in map)</msg>\n </messages>\n</response>\n"
Does this give any clues to someone who knows the inner workings of this to say what's going wrong?
Perhaps refer to this issue: https://answers.splunk.com/answers/373084/bug-why-are-my-summary-searches-being-re-run-occas.html
Not in all cases, as I have seen this problem with Splunk 6.3.3, and the bug you refer to is claimed to be fixed in 6.3.0
Can anyone provide a response as to what this means?
To my understand of the logs, the scheduled search "My Report" was ran twice on two different nodes of the search head cluster. Please confirm this?!
Yes, but I believe I found the issue in that I had 2 saved searches named the same and executing at the same time. 1 private, 1 app level. It hasnt happened since I reported this.