Hi, I need to report on when a Notable alert was changed from the default "unassigned" status to " Acknowledged" status and from Acknowledged to "Resolved" along with the time difference it took between each status. Basically, we are trying to create a dashboard of all alerts whose SLA was missed.
We have an SLA for 10 mins for a notable alert to be picked up, meaning an analyst should change its default "unassigned" status to " Acknowledged" status. Likewise, there is SLA for 30 mins to further change from Acknowledged to Resolved.
Running the following query, Splunk shows the _time value for each alert when it was Acknowledged and when Resolved. But it does NOT show when the alert was triggered/generated. So that does not leave me with any starting point to compare against.
1) How can i compose a query to show me list of all alerts (rule_name) which were acknowledged more than 10 mins late and resolved more than 30 mins late ?
I am assuming this will involve some eval logic to calculate difference between acknowleged_time minus Triggered_time and checking if the difference is > 10 mins . If it is, then eval SLA_status = breached else SLA_Status= met . Likewise for resolved_time as well.
I am assuming a lot of you ES folks must be doing this kind of SLA metrics tracking some way or other. Kindly assist.