Alerting

Search Head Clustering: Why am I getting a duplicate alert for one scheduled job?

BP9906
Builder

I'm receiving a duplicate alert for 1 given scheduled job. It appears to be scheduled twice.
I see a delegated_remote_error and then it gets scheduled twice. Why is this happening?

05-20-2015 04:00:27.427 -0700 INFO  SavedSplunker - user="me", app="myapp", savedsearch_name="My Report", status=delegated_remote_error, scheduled_time=1432119600, peer_guid=ADDF764A-CCDF-478E-BF74-EE80F3A905EC
05-20-2015 04:00:31.015 -0700 INFO  SavedSplunker - user="me", app="myapp", savedsearch_name="My Report", status=delegated_remote, scheduled_time=1432119600, peer_guid=C0F9897B-3E7D-407E
-AF7C-573164F3EC04, sid = scheduler__me_UFZPLU9uc2FsZQ__RMD5d867250ff1e04aea_at_1432119600_54459_C0F9897B-3E7D-407E-AF7C-573164F3EC04
05-20-2015 04:00:55.352 -0700 INFO  SavedSplunker - user="me", app="myapp", savedsearch_name="My Report", status=delegated_remote, scheduled_time=1432119600, peer_guid=ADDF764A-CCDF-478E-BF74-EE80F3A905EC, sid = scheduler__me_UFZPLU9uc2FsZQ__RMD5d867250ff1e04aea_at_1432119600_62152_ADDF764A-CCDF-478E-BF74-EE80F3A905EC
05-20-2015 05:11:11.858 -0700 INFO  SavedSplunker - user="me", app="myapp", savedsearch_name="My Report", status=delegated_remote_completion, scheduled_time=1432119600, peer_guid=C0F9897B-3E7D-407E-AF7C-573164F3EC04, sid = scheduler__me_UFZPLU9uc2FsZQ__RMD5d867250ff1e04aea_at_1432119600_54459_C0F9897B-3E7D-407E-AF7C-573164F3EC04 success=1
05-20-2015 05:11:11.878 -0700 INFO  SavedSplunker - user="me", app="myapp", savedsearch_name="My Report", status=delegated_remote_completion, scheduled_time=1432119600, peer_guid=ADDF764A-CCDF-478E-BF74-EE80F3A905EC, sid = scheduler__me_UFZPLU9uc2FsZQ__RMD5d867250ff1e04aea_at_1432119600_62152_ADDF764A-CCDF-478E-BF74-EE80F3A905EC success=1

dmr195
Communicator

The sequence seems to be that the captain delegates the search to one search head, but suffers the error indicated by "status=delegated_remote_error" in the first log message in the original post. Then, understandably, it delegates the search to a different search head. But the first search head also runs the search.

I have seen the same problem, and when I looked at splunkd.log on the search head that the captain said "status=delegated_remote_error" for, I saw these errors:

06-29-2016 10:04:01.014 -0400 WARN  ISplunkDispatch - sid:scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 Gave up waiting for the captain to establish a common bundle version across all search peers; using most recent bundles on all peers instead

06-29-2016 10:04:55.643 -0400 WARN  SHPMasterHTTPProxy - Low Level http request failure err=failed method=POST path=/services/shcluster/captain/jobs/scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6/report_job_completion captain=xsplunkm3d:8089 rc=0 actual_response_code=500 expected_response_code=200 status_line=Internal Server Error error="<response>\n  <messages>\n    <msg type="ERROR">\n In handler 'shclustercaptainjobs': job completion failed peer=6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 sid=scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 (reason: event=SHPMaster::handleDelegatedJobCompletion peer=6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 job=scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 did not exist in map)</msg>\n  </messages>\n</response>\n"

06-29-2016 10:04:55.643 -0400 WARN  SHPSlave - event=SHPSlave::delegatedJobCompletion Failed to notify captain of job completion job=scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6. reason=failed method=POST path=/services/shcluster/captain/jobs/scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6/report_job_completion captain=xsplunkm3d:8089 rc=0 actual_response_code=500 expected_response_code=200 status_line=Internal Server Error error="<response>\n  <messages>\n    <msg type="ERROR">\n In handler 'shclustercaptainjobs': job completion failed peer=6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 sid=scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 (reason: event=SHPMaster::handleDelegatedJobCompletion peer=6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 job=scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 did not exist in map)</msg>\n  </messages>\n</response>\n"

Does this give any clues to someone who knows the inner workings of this to say what's going wrong?

0 Karma

the_wolverine
Champion
0 Karma

dmr195
Communicator

Not in all cases, as I have seen this problem with Splunk 6.3.3, and the bug you refer to is claimed to be fixed in 6.3.0

0 Karma

ben_leung
Builder

Can anyone provide a response as to what this means?

0 Karma

ben_leung
Builder

To my understand of the logs, the scheduled search "My Report" was ran twice on two different nodes of the search head cluster. Please confirm this?!

0 Karma

BP9906
Builder

Yes, but I believe I found the issue in that I had 2 saved searches named the same and executing at the same time. 1 private, 1 app level. It hasnt happened since I reported this.

0 Karma
Get Updates on the Splunk Community!

Your Guide to Splunk Digital Experience Monitoring

A flawless digital experience isn't just an advantage, it's key to customer loyalty and business success. But ...

Data Management Digest – November 2025

  Welcome to the inaugural edition of Data Management Digest! As your trusted partner in data innovation, the ...

Upcoming Webinar: Unmasking Insider Threats with Slunk Enterprise Security’s UEBA

Join us on Wed, Dec 10. at 10AM PST / 1PM EST for a live webinar and demo with Splunk experts! Discover how ...