Alerting

Search Head Clustering: Why am I getting a duplicate alert for one scheduled job?

BP9906
Builder

I'm receiving a duplicate alert for 1 given scheduled job. It appears to be scheduled twice.
I see a delegated_remote_error and then it gets scheduled twice. Why is this happening?

05-20-2015 04:00:27.427 -0700 INFO  SavedSplunker - user="me", app="myapp", savedsearch_name="My Report", status=delegated_remote_error, scheduled_time=1432119600, peer_guid=ADDF764A-CCDF-478E-BF74-EE80F3A905EC
05-20-2015 04:00:31.015 -0700 INFO  SavedSplunker - user="me", app="myapp", savedsearch_name="My Report", status=delegated_remote, scheduled_time=1432119600, peer_guid=C0F9897B-3E7D-407E
-AF7C-573164F3EC04, sid = scheduler__me_UFZPLU9uc2FsZQ__RMD5d867250ff1e04aea_at_1432119600_54459_C0F9897B-3E7D-407E-AF7C-573164F3EC04
05-20-2015 04:00:55.352 -0700 INFO  SavedSplunker - user="me", app="myapp", savedsearch_name="My Report", status=delegated_remote, scheduled_time=1432119600, peer_guid=ADDF764A-CCDF-478E-BF74-EE80F3A905EC, sid = scheduler__me_UFZPLU9uc2FsZQ__RMD5d867250ff1e04aea_at_1432119600_62152_ADDF764A-CCDF-478E-BF74-EE80F3A905EC
05-20-2015 05:11:11.858 -0700 INFO  SavedSplunker - user="me", app="myapp", savedsearch_name="My Report", status=delegated_remote_completion, scheduled_time=1432119600, peer_guid=C0F9897B-3E7D-407E-AF7C-573164F3EC04, sid = scheduler__me_UFZPLU9uc2FsZQ__RMD5d867250ff1e04aea_at_1432119600_54459_C0F9897B-3E7D-407E-AF7C-573164F3EC04 success=1
05-20-2015 05:11:11.878 -0700 INFO  SavedSplunker - user="me", app="myapp", savedsearch_name="My Report", status=delegated_remote_completion, scheduled_time=1432119600, peer_guid=ADDF764A-CCDF-478E-BF74-EE80F3A905EC, sid = scheduler__me_UFZPLU9uc2FsZQ__RMD5d867250ff1e04aea_at_1432119600_62152_ADDF764A-CCDF-478E-BF74-EE80F3A905EC success=1

dmr195
Communicator

The sequence seems to be that the captain delegates the search to one search head, but suffers the error indicated by "status=delegated_remote_error" in the first log message in the original post. Then, understandably, it delegates the search to a different search head. But the first search head also runs the search.

I have seen the same problem, and when I looked at splunkd.log on the search head that the captain said "status=delegated_remote_error" for, I saw these errors:

06-29-2016 10:04:01.014 -0400 WARN  ISplunkDispatch - sid:scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 Gave up waiting for the captain to establish a common bundle version across all search peers; using most recent bundles on all peers instead

06-29-2016 10:04:55.643 -0400 WARN  SHPMasterHTTPProxy - Low Level http request failure err=failed method=POST path=/services/shcluster/captain/jobs/scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6/report_job_completion captain=xsplunkm3d:8089 rc=0 actual_response_code=500 expected_response_code=200 status_line=Internal Server Error error="<response>\n  <messages>\n    <msg type="ERROR">\n In handler 'shclustercaptainjobs': job completion failed peer=6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 sid=scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 (reason: event=SHPMaster::handleDelegatedJobCompletion peer=6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 job=scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 did not exist in map)</msg>\n  </messages>\n</response>\n"

06-29-2016 10:04:55.643 -0400 WARN  SHPSlave - event=SHPSlave::delegatedJobCompletion Failed to notify captain of job completion job=scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6. reason=failed method=POST path=/services/shcluster/captain/jobs/scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6/report_job_completion captain=xsplunkm3d:8089 rc=0 actual_response_code=500 expected_response_code=200 status_line=Internal Server Error error="<response>\n  <messages>\n    <msg type="ERROR">\n In handler 'shclustercaptainjobs': job completion failed peer=6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 sid=scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 (reason: event=SHPMaster::handleDelegatedJobCompletion peer=6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 job=scheduler__nobody__prelert__RMD57403f7e9934e6f78_at_1467208920_3438_6408AC6F-431C-41D6-B4EE-7BE2A978D5D6 did not exist in map)</msg>\n  </messages>\n</response>\n"

Does this give any clues to someone who knows the inner workings of this to say what's going wrong?

0 Karma

the_wolverine
Champion
0 Karma

dmr195
Communicator

Not in all cases, as I have seen this problem with Splunk 6.3.3, and the bug you refer to is claimed to be fixed in 6.3.0

0 Karma

ben_leung
Builder

Can anyone provide a response as to what this means?

0 Karma

ben_leung
Builder

To my understand of the logs, the scheduled search "My Report" was ran twice on two different nodes of the search head cluster. Please confirm this?!

0 Karma

BP9906
Builder

Yes, but I believe I found the issue in that I had 2 saved searches named the same and executing at the same time. 1 private, 1 app level. It hasnt happened since I reported this.

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...