Hi, I need to report on when a Notable alert was changed from the default "unassigned" status to " Acknowledged" status and from Acknowledged to "Resolved" along with the time difference it took between each status. Basically, we are trying to create a dashboard of all alerts whose SLA was missed.
We have an SLA for 10 mins for a notable alert to be picked up, meaning an analyst should change its default "unassigned" status to " Acknowledged" status. Likewise, there is SLA for 30 mins to further change from Acknowledged to Resolved.
Running the following query, Splunk shows the _time value for each alert when it was Acknowledged and when Resolved. But it does NOT show when the alert was triggered/generated. So that does not leave me with any starting point to compare against.
| `incident_review`
| table _time rule_id rule_name owner reviewer status_label
| where _time > relative_time(now(),"-1d@d")
| eval Status_Time=strftime(_time,"%Y-%m-%d %H:%M:%S")
Output:
_time | rule_id | rule_name | owner | reviewer | status_label |
07 July 2022 08:00:00 | xxxxx | AWS001_xx | John | John | Acknowledged |
07 July 2022 08:10:00 | xxxxx | AWS001_xx | John | John | Resolved |
07 July 2022 08:01:00 | yyyyy | AWS002_xx | Jerry | Jerry | Acknowledged |
1) How can i compose a query to show me list of all alerts (rule_name) which were acknowledged more than 10 mins late and resolved more than 30 mins late ?
I am assuming this will involve some eval logic to calculate difference between acknowleged_time minus Triggered_time and checking if the difference is > 10 mins . If it is, then eval SLA_status = breached else SLA_Status= met . Likewise for resolved_time as well.
I am assuming a lot of you ES folks must be doing this kind of SLA metrics tracking some way or other. Kindly assist.
Thanks in advance
Did you got any progress on this one 😥 ?