Splunk Search

How to create an alert to notify when a job runs over a specified time?

smanojkumar
Communicator

My requirement is to notify when the job runs more than the specified time,

condition 1 - the first job of every day should run less than 45mins, if exceeds than 45mins, trigger an alert

condition 2 - Rest of all jobs of all days should not exceed 10 mins, if exceeds 10 mins, trigger an alert
 condition 3 - Is these jobs does not run every 15 mins (job needs to start its run for every 15 mins), need to trigger an alert

Labels (1)
0 Karma

gcusello
Esteemed Legend

Hi @smanojkumar,

Some questions:

  • at first, if a job doesn't run, have you an event or not?
  • I suppose that you have differen processes to monitor for each host, is this correct?
  • how do you have the duration information? is there a field duration or do you have two events (start/end) to correlate?
  • when you say the first job of a day, are you meaning the first occurrence of each kind of job or generally the first occurrence of a job?

the first question is mandatory, because if you don't have an event when the process is missed (normal behavior), you need to create a list of processe to monitor in a lookup and then check the missing ones with an alert alike the following:

index=your_index
| eval process=lower(process), host=lower(host)
| stats count BY host process
| append [ | inputlookup processes.csv | eval process=lower(process), host=lower(host), count=0 | fields process host count ]
| stats sum(count) AS total  BY host process
| where total=0

The second one is relevantfo the above search.

The third and the fourth questions are the most relevant to answer to your question because it's mandatory to understand how to extract the events containing the process duraction value.

Ciao.

Giuseppe

0 Karma

smanojkumar
Communicator

Hi @gcusello , Thanks for your response!

 

`sap-abap(SM37,"")` source="PR1" JOBLOGDATA="***"
| search (JOBNAME=J1 OR JOBNAME=J2 OR JOBNAME=J3 OR JOBNAME=J4 OR JOBNAME=J5 OR JOBNAME=J6)
| rename STATUS as status
| table _time , status , STRTTIME , ENDTIME , EVENT_TYPE

status - Failed/Running

1. we cant get the event if the job is not running,

2. we are having only 6 six jobs to monitor

3. We are having field, STRTDATE, STRTTIME, ENDDATE, ENDTIME

4. For all the first occurrences of all 6 jobs

 

0 Karma

gcusello
Esteemed Legend

Hi @smanojkumar,

must the six processes be checked on all the hosts?

are  STRTTIME , ENDTIME , EVENT_TYPE fields of the same event or must be taken by more events? 

if the check is in all hosts and the above fields must be taken from one event, try something like this:

`sap-abap(SM37,"")` source="PR1" JOBLOGDATA="***" JOBNAME IN ("J1","J2","J3","J4","J5","J6")
| rename STATUS as status
| stats 
   earliest(_time) AS _time 
   values(status) AS status 
   values(STRTTIME) AS STRTTIME
   values(ENDTIME) AS ENDTIME
   values(EVENT_TYPE) AS EVENT_TYPE
   count
   BY JOBNAME
| append [ | makeresults | eval JOBNAME="J1", count=0 | fields JOBNAME, count ]
| append [ | makeresults | eval JOBNAME="J2", count=0 | fields JOBNAME, count ]
| append [ | makeresults | eval JOBNAME="J3", count=0 | fields JOBNAME, count ]
| append [ | makeresults | eval JOBNAME="J4", count=0 | fields JOBNAME, count ]
| append [ | makeresults | eval JOBNAME="J5", count=0 | fields JOBNAME, count ]
| append [ | makeresults | eval JOBNAME="J6", count=0 | fields JOBNAME, count ]
| stats 
   earliest(_time) AS _time 
   values(status) AS status 
   values(STRTTIME) AS STRTTIME
   values(ENDTIME) AS ENDTIME
   values(EVENT_TYPE) AS EVENT_TYPE
   sum(count) AS total 
   BY JOBNAME
| where total=0

One hint, don't use the search command after the main search, you have slower searches, try to put search parameters as left as possible in your search.

Ciao.

Giuseppe

0 Karma
Get Updates on the Splunk Community!

Splunk APM & RUM | Upcoming Planned Maintenance

There will be planned maintenance of the streaming infrastructure for Splunk APM and Splunk RUM in the coming ...

Part 2: Diving Deeper With AIOps

Getting the Most Out of Event Correlation and Alert Storm Detection in Splunk IT Service Intelligence   Watch ...

User Groups | Upcoming Events!

If by chance you weren't already aware, the Splunk Community is host to numerous User Groups, organized ...