This is the first time I am using an advanced conditional alert in savedsearches.conf.
I'd like to get some feedback about current configurations I have around monitoring scheduled jobs.
If a job is hung for x amount of time, the alert should kick off, however one was manually suspended last night and nothing came out. Here is a sample of my savedsearches.conf along with a sample of the search:
action.email.inline = 1
action.script = 1
action.script.filename = email_alert.sh
alert.digest_mode = True
alert.expires = 24h
alert.suppress = 0
alert.track = 1
**alert_condition = | where last_run_ago_seconds>7200
counttype = custom**
cron_schedule = 00 09,10,11,12,13,14,15,16,17,18,19,20,21,22 * * *
displayview = flashtimeline
enableSched = 1
search = index=index earliest=-60m@m latest=@m sourcetype=blah <servicenamehere> | head 100 | stats latest(_time) as last_seen, first(host) as host_start by service | addinfo | eval last_run_ago_seconds=round( info_search_time-last_seen ) | stats min(last_run_ago_seconds) as last_run_ago_seconds, values(host_start) as host_start by service | fillnull value="n/a" host_start | eval message=if(last_run_ago_seconds>7200, "This Job May Be Hung", "Job Looks OK") | table service,last_run_ago_seconds,host_start,message
When I run the search manually things look OK, but I want to make sure my use of alert_condition and counttype are correct. Or, if there is another way of kicking off a similar alert I am open to suggestions.
The spec file mentions that, if you include an alert_condition, you should not set counttype, relation, or quantity. I've corrected a discrepancy in older versions of our documentation that stated otherwise.