I need to create an alert for failed scheduled saved searches. If any scheduled saved searches fails to run due to scheduler problem or any reason, then it would trigger an alert. Can anyone please help me here?
I have tried and found different scheduling status as shown in the attachment.
Among these status values which one should I use for this purpose I'm not sure. So any guidance is welcome.
Thank you for your reply.
so status="skipped" means job scheduling has been failed. Then what about delegatedremoteerror?
You can limit it to run on your search heads by adding a pattern or list for your search heads to the query below i.e.
index=_internal sourcetype=scheduler status!=success | table _time search_type status user app savedsearch_name
@rob_jordan thank you for your reply.
if I use status!=success, then it would consider all those above mentioned status values. But in case of successful scheduled run, the job goes thru some of those status values before becoming success. So even if the job has been scheduled successfully and run properly, it will appear in that query.
index=internal sourcetype=scheduler status IN(skipped,continued)
| table _time searchtype status user app savedsearch_name
I think these two should cover you for most scenarios.
Skipped is usually not run due to capacity of user/role or something like being out of disk space.
Continued is also bad as it means the previous run didn't finish before the next run is attempted.
I believe Splunk will only attempt to run one copy of each search unless you override which is usually not a good thing.
@rob_jordan thank you.... I'm using this query... but I have one more ask regarding the continued status ... is it something earlier job run has failed, now it's running the current scheduled? is the current scheduled successful? That is I want to know if the status is continued then is it running the job?
hmm on my search heads/search clusters it seems that success+skipped+continued = total scheduled searches
host=searchheadpattern index=_internal sourcetype=scheduler
| top 100 status
I only see success, skipped, continued. I'm thinking that's all you need to tell if there is an issue at a high level. There could be counterpart errors on the indexers, however on the search head it will likely be reported as skipped or continued.
I think your deployment is not distributed. That's why you are not able to view the delegated related status values.