Alerting

Search Head Clustering: Why am I getting a duplicate alert for one scheduled job?

BP9906
Builder

I'm receiving a duplicate alert for 1 given scheduled job. It appears to be scheduled twice.
I see a delegated_remote_error and then it gets scheduled twice. Why is this happening?

05-20-2015 04:00:27.427 -0700 INFO  SavedSplunker - user="me", app="myapp", savedsearch_name="My Report", status=delegated_remote_error, scheduled_time=1432119600, peer_guid=ADDF764A-CCDF-478E-BF74-EE80F3A905EC
05-20-2015 04:00:31.015 -0700 INFO  SavedSplunker - user="me", app="myapp", savedsearch_name="My Report", status=delegated_remote, scheduled_time=1432119600, peer_guid=C0F9897B-3E7D-407E
-AF7C-573164F3EC04, sid = scheduler__me_UFZPLU9uc2FsZQ__RMD5d867250ff1e04aea_at_1432119600_54459_C0F9897B-3E7D-407E-AF7C-573164F3EC04
05-20-2015 04:00:55.352 -0700 INFO  SavedSplunker - user="me", app="myapp", savedsearch_name="My Report", status=delegated_remote, scheduled_time=1432119600, peer_guid=ADDF764A-CCDF-478E-BF74-EE80F3A905EC, sid = scheduler__me_UFZPLU9uc2FsZQ__RMD5d867250ff1e04aea_at_1432119600_62152_ADDF764A-CCDF-478E-BF74-EE80F3A905EC
05-20-2015 05:11:11.858 -0700 INFO  SavedSplunker - user="me", app="myapp", savedsearch_name="My Report", status=delegated_remote_completion, scheduled_time=1432119600, peer_guid=C0F9897B-3E7D-407E-AF7C-573164F3EC04, sid = scheduler__me_UFZPLU9uc2FsZQ__RMD5d867250ff1e04aea_at_1432119600_54459_C0F9897B-3E7D-407E-AF7C-573164F3EC04 success=1
05-20-2015 05:11:11.878 -0700 INFO  SavedSplunker - user="me", app="myapp", savedsearch_name="My Report", status=delegated_remote_completion, scheduled_time=1432119600, peer_guid=ADDF764A-CCDF-478E-BF74-EE80F3A905EC, sid = scheduler__me_UFZPLU9uc2FsZQ__RMD5d867250ff1e04aea_at_1432119600_62152_ADDF764A-CCDF-478E-BF74-EE80F3A905EC success=1

dmr195
Communicator

The sequence seems to be that the captain delegates the search to one search head, but suffers the error indicated by "status=delegated_remote_error" in the first log message in the original post. Then, understandably, it delegates the search to a different search head. But the first search head also runs the search.

I have seen the same problem, and when I looked at splunkd.log on the search head that the captain said "status=delegated_remote_error" for, I saw these errors:

06-29-2016 10:04:01.014 -0400 WARN  ISplunkDispatch - sid:scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 Gave up waiting for the captain to establish a common bundle version across all search peers; using most recent bundles on all peers instead

06-29-2016 10:04:55.643 -0400 WARN  SHPMasterHTTPProxy - Low Level http request failure err=failed method=POST path=/services/shcluster/captain/jobs/scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6/report_job_completion captain=xsplunkm3d:8089 rc=0 actual_response_code=500 expected_response_code=200 status_line=Internal Server Error error="<response>\n  <messages>\n    <msg type="ERROR">\n In handler 'shclustercaptainjobs': job completion failed peer=6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 sid=scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 (reason: event=SHPMaster::handleDelegatedJobCompletion peer=6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 job=scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 did not exist in map)</msg>\n  </messages>\n</response>\n"

06-29-2016 10:04:55.643 -0400 WARN  SHPSlave - event=SHPSlave::delegatedJobCompletion Failed to notify captain of job completion job=scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6. reason=failed method=POST path=/services/shcluster/captain/jobs/scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6/report_job_completion captain=xsplunkm3d:8089 rc=0 actual_response_code=500 expected_response_code=200 status_line=Internal Server Error error="<response>\n  <messages>\n    <msg type="ERROR">\n In handler 'shclustercaptainjobs': job completion failed peer=6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 sid=scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 (reason: event=SHPMaster::handleDelegatedJobCompletion peer=6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 job=scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 did not exist in map)</msg>\n  </messages>\n</response>\n"

Does this give any clues to someone who knows the inner workings of this to say what's going wrong?

0 Karma

the_wolverine
Champion
0 Karma

dmr195
Communicator

Not in all cases, as I have seen this problem with Splunk 6.3.3, and the bug you refer to is claimed to be fixed in 6.3.0

0 Karma

ben_leung
Builder

Can anyone provide a response as to what this means?

0 Karma

ben_leung
Builder

To my understand of the logs, the scheduled search "My Report" was ran twice on two different nodes of the search head cluster. Please confirm this?!

0 Karma

BP9906
Builder

Yes, but I believe I found the issue in that I had 2 saved searches named the same and executing at the same time. 1 private, 1 app level. It hasnt happened since I reported this.

0 Karma
Get Updates on the Splunk Community!

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

The Big One: Splunk 10 is Here!  The moment many of you have been waiting for has arrived! We are thrilled to ...

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console

Today, we’re excited to announce the release of a brand new AI assistant usage dashboard in Cloud Monitoring ...

Stay Connected: Your Guide to October Tech Talks, Office Hours, and Webinars!

What are Community Office Hours? Community Office Hours is an interactive 60-minute Zoom series where ...