I have created an Alert when the response time is high and service is down and scheduled it on Cron job which is set to every 2 minutes. So it is notifying me when the Server is down.
But I also want to create an Alert which will show the recovery Alert means when the service is Up again the alert should trigger only one time after the down.
I have created this Up alert also by putting the condition low response time but it is triggering every 2 minutes and sending an email. Which is actually not required.
So I just need only one email notification after down that service is Up again and running normally.
There are probably more ways to tackle this problem but the easiest one that comes to mind would be to add an additional action to the original alert that writes an entry into a lookup.
Then as the search for the recovery "alert" you'd check not only for the "it's working ok" conditon but for the contents of the lookup and only trigger the alert if it was in a "bad" state. This one also would need to have the action which would "clear" the lookup.
Effectively you'd have to do a flag-like entry in a lookup.
Thank You for your suggestions. But still I am struggling to make it work.
Could you please guide me more on this.
I have defined the below query to create a Down Alert which has the condition Percentile_80>=5. So whenever Percentile_80 is <5 then this should be the Up Alert. And this should trigger only once instead of triggering every 2 minutes and when again Down alert trigger again Up Alert should trigger.
| eval RespTime=time_taken/1000
| eval RespTime = round(RespTime,2)
| bucket _time span=2m
| stats avg(RespTime) as Average perc80(RespTime) as "Percentile_80" by _time HUB cs_method
| where Percentile_80>=5
OK. My idea was good but I missed that you can't just insert an arbitrary value into the lookup but have to save the whole result set. That makes life more complicated.
Without doing any custom actions you can do it differently.
Add a "add to triggered alerts" action to both of your alerts (the Down alert and the Recovery alert).
In your Recovery alert you can "filter" by the status of the rest query to see if you already ad the Recovery alert more recent than your Down alert or not. Assuming that your alerts are called "My Down Alert" and "My Recovery Alert" you do it like that:
<your recovery alert search>
| appendcols [
| rest /servicesNS/admin/-/alerts/fired_alerts/-
| stats max(trigger_time) as time by savedsearch_name
| search savedsearch_name IN ("My Down Alert", "My Recovery Alert")
| transpose header_field=savedsearch_name
| table "My Down Alert" "My Recovery Alert" ]
| filldown "My Down Alert" "My Recovery Alert"
Now with every result in your recovery alert search you have two additional fields - "My Down Alert" and "My Recovery Alert" which contain respectively times of last trigger of each of those alerts. You'll be most likely interested in limiting your recovery alert only to the situation in which the last triggered recovery alert was before the down alert fired.
| where 'My Down Alert' > 'My Down Alert'
(mind the quotes here!)
The downside to this method which could require additional handling is that alerts expire after some time.