Hey all.
So there is this thing that I periodically trawl the web for and never seem to find any results.
It's a small thing but a huge bug bear.
I really wanted all of our pagerduty alerts using the app in splunk to autoresolve when the trigger had cleared.
It looks like I may have come across a possible solution, and I wanted to see if anyone else has done this or found a better way. And to share in case I can help someone else.
Today I finally managed to get this working, and the trick was a new feature in Pagerduty called Eventrules.
Essentially you can create a an EventRuleset which will get you a new integration URL and key, and allow you to apply rules to non-standard fields in the payload.
An example ruleset based on the following result payload would look like the following"
"result": {
"routing_key":"your eventruleset routing key",
"dedup_key":"yourunique key, maybe the alert name",
"event_action":"trigger or resolve",
"severity":"critical or warning etc",
"summary":"some high level info",
"source":"usually the machine with a problem"
}
Your event rules then may look something like the below:
if result.event_action=resolve then resolve the incident using the dedup key
if result.event_action=trigger and result.severity=critical then severity is critical and raise new alert
etc
Now this does raise a small problem in that if your alert raises no rows then you won't be able to send the resolve.
The below is extremely simple but does the job as an example by adding a stats count. The only caveat is that if you want an alert per event, you need to do some more work.
index=_internal ERROR
| stats count as event_count
| eval dedup_key="ddddd"
| eval severity="warning"
| eval event_action=case(event_count>0,"trigger",1=1,"resolve")
| eval summary="A summary of this event"
| eval source="servera.example.com"
| eval routing_key="XXXXXXXXX"
| table dedup_key,severity,event_action, summary, source, routing_key
All that's required after this is to update your integration url to use the eventrules api and key.
This can however be spammy, because an api call will be made every time the alert runs.
To get past that I created a KVStore called state_event and do an outputlookup to it each time the event runs. The fields i record are _key=dedup_key, event_action, date_last_changed and date_last_run.
In the pagerduty alert properties you can change the alert trigger to custom and set the search to "where event_action!=event_action_lookup".
Now it will only fire if this runs event_action(trigger or resolve) is different from the last. This will reduce noise and allow for some stateful tracking of each alert if so desired.
Below is an example of the code i put after the table * example above to handle the above logic:
eval _key=dedup_key
| eval date_last_run=now()
| join type=left
[| inputlookup state_alert
| rename event_action AS event_action_lookup
| rename date_last_change AS date_last_change_lookup
| fields event_action_lookup, date_last_change_lookup]
| eval date_last_change=case(event_action!=event_action_lookup, now(),1=1,date_last_change_lookup)
| outputlookup state_alert append=true key_field=_key
Hopefully someone finds this useful, can improve on it, or show me a better way 🙂
Awesome stuff! I reproduced your instructions in a step by step guide.