Situation: I have jobs that start running at different times because they are dependent on previous jobs to run successfully. There are two events I am concerned with. One event for jobs runs on time and another for job did not finish.
Goal: Alert when the job has ran on time and alert when the job did not finish
Is this possible in splunk? Each job has a deadline(time that should have started to have been completed by the due out time) and a due-out time. Any help is appreciated .
Lets say we have jobs named job1 and job2:
job1's DUE-OUT time is 1:00pm. However it completes its run by 12:50 pm. I need an alert triggered to let me know this job completed at 1:00pm or earlier (when it happens).
job2's DUE-OUT time is 3:00pm. However it completes its run by 3:30pm. I need an alert triggered to let me know this job did not complete by 3:00 pm (@3pm).
Sounds good, I have provided my approach below, however it is a little rough and I would prefer having a set due out time instead of an average expected time:
"jobLabel"="name of job" | spath "msg.ResponseCode" | search "msg.ResponseCode"=| spath "msg.TypeOfRecord" | search "msg.TypeOfRecord"= | spath "msg.MessageReturn" | search "msg.MessagReturn"=*
| stats earliest(timestamp) as StartTime, latest(timestamp) as EndTime, avg(EndTime-StartTime) as ExpectedDuration | eval StartTime=strftime(StartTime,"%F %T") | eval EndTime=if("msg.MessageReturn"="PROCESSED RECORD", _time,now()-EndTime)| eval Duration=if("msg.MessageReturn"="PROCESSED RECORD", run_time,now()-EndTime) | eval status=if("msg.MessageReturn"!="PROCESSED RECORD","running","success")
Hopefully this helps! Thank you
You still haven't provided sample data, but I may have enough to work with.
How is DUE-OUT determined? Is it a fixed time after StartTime?
If you want a set due out time, why not use
eval ExpectedDuration=3600 or similar?
That helps explain the logic, but doesn't say what fields are available. Please share some exact events (sanitized as necessary).