Splunk Search

How to send an alert when a Job does not finish within expected time?

Explorer

Hi Everyone,

I am a newbie to Splunk and need little help with the alerting system. I want to setup a real time alerting system based on below use case. I have a Job - 'Job1' which starts at 11am. The expected finish time for Job1 is 30 mins i.e. 11.30am. Here, Expected Finish Time is the average of Duration(i.e. RunTime) for Job1 over past 90 days.Now, if the Job1 does not finish till 11.30 am, I need to send an alert to indicate that Job1 is still running. Is it possible to do something like this with Splunk?
I will send events to Splunk like
1. Job1 - StartTime (i.e. 11.00 am)
2. Job1- expected Finish Time (11.30am) OR Job1 - expected Finish Time (i.e. 30 mins) [I can send either of this]
3. Job1 - Actual Finish Time (11.40 am)

Also, Can Splunk Calculate the Expected Finish Time based on the history of the Job1 from previous 90 days? This will help in eliminating the need to send event 2 .

Requesting for help with this case.

Regards,
Sneha Salvi

0 Karma
1 Solution

SplunkTrust
SplunkTrust

You can utilize Splunk's scheduler's log to get the historical job run time and current start/finish events. Please note that by default only 30 days of _internal index's data is retained, so it might not be possible to get 90 day average, unless you increase the retention period of _internal to 90 days.

Do you want to alert as soon as it doesn't finish within expected time OR just a reactive alert which check's today's run and sends alert if it took more time than usual?

Assuming it's later, give this a try in meanwhile.
Cron: 10 12 * * *
Search:

index=_internal sourcetype=scheduler savedsearch_name="YourScheduledJobName" status=success earliest=-90d 
| stats avg(run_time) as ExpectedDuration | appendcols [search index=_internal sourcetype=scheduler savedsearch_name="YourScheduledJobName" earliest=@d | dedup savedsearch_name | eval StartTime=strftime(dispatch_time,"%F %T") | eval Finish_Time=if(status="success", _time,now()-dispatch_time) | eval Duration=if(status="success", run_time,now()-dispatch_time) | eval status=if(status!="success","running","success") ] | where status="Running" OR Duration>ExpectedDuration

Alert condition: if number of events are greater than 0

View solution in original post

0 Karma

SplunkTrust
SplunkTrust

You can utilize Splunk's scheduler's log to get the historical job run time and current start/finish events. Please note that by default only 30 days of _internal index's data is retained, so it might not be possible to get 90 day average, unless you increase the retention period of _internal to 90 days.

Do you want to alert as soon as it doesn't finish within expected time OR just a reactive alert which check's today's run and sends alert if it took more time than usual?

Assuming it's later, give this a try in meanwhile.
Cron: 10 12 * * *
Search:

index=_internal sourcetype=scheduler savedsearch_name="YourScheduledJobName" status=success earliest=-90d 
| stats avg(run_time) as ExpectedDuration | appendcols [search index=_internal sourcetype=scheduler savedsearch_name="YourScheduledJobName" earliest=@d | dedup savedsearch_name | eval StartTime=strftime(dispatch_time,"%F %T") | eval Finish_Time=if(status="success", _time,now()-dispatch_time) | eval Duration=if(status="success", run_time,now()-dispatch_time) | eval status=if(status!="success","running","success") ] | where status="Running" OR Duration>ExpectedDuration

Alert condition: if number of events are greater than 0

View solution in original post

0 Karma

Explorer

Actually I want to alert as soon as it doesn't finish within expected time. (ExpectedRunTime = Avg(runtime) + StartTime)
For example: if the ExpectedRunTime is 30 mins and Job started at 11.00am , I would want to send the alert at 11.31am or so.

0 Karma