Dashboards & Visualizations

Help generating a dynamic alert

jperrySplunk
New Member

I have a summary index that is populated every 5 minutes from a report. The report shows when the last update was for each panel and the current status of each panel. The status will change from normal to degraded if the last update was more that 480 minutes ago.

The events in the summary index look like this ( not including the fields that are added by the summary index process)

03/09/2020 18:30:00 +0000, Last_Received="2020-03-09 18:10:20 GMT", Minutes_Ago=20, Panel="Panel-1, Status=normal
03/09/2020 18:30:00 +0000, Last_Received="2020-03-09 18:10:20 GMT", Minutes_Ago=20, Panel="Panel-2, Status=normal
03/09/2020 18:30:00 +0000, Last_Received="2020-03-09 18:10:20 GMT", Minutes_Ago=20, Panel="Panel-3, Status=normal
03/09/2020 18:30:00 +0000, Last_Received="2020-03-09 18:10:20 GMT", Minutes_Ago=20, Panel="Panel-4, Status=normal
03/09/2020 18:30:00 +0000, Last_Received="2020-03-09 18:10:20 GMT", Minutes_Ago=20, Panel="Panel-5, Status=normal
03/09/2020 18:30:00 +0000, Last_Received="2020-03-09 18:10:20 GMT", Minutes_Ago=20, Panel="Panel-6, Status=normal
03/09/2020 18:30:00 +0000, Last_Received="2020-03-09 18:10:20 GMT", Minutes_Ago=20, Panel="Panel-7, Status=normal
03/09/2020 18:30:00 +0000, Last_Received="2020-03-09 18:10:20 GMT", Minutes_Ago=20, Panel="Panel-8, Status=normal
03/09/2020 18:30:00 +0000, Last_Received="2020-03-09 18:10:20 GMT", Minutes_Ago=20, Panel="Panel-9, Status=normal
03/09/2020 18:30:00 +0000, Last_Received="2020-03-09 18:10:20 GMT", Minutes_Ago=20, Panel="Panel-10, Status=normal
03/09/2020 18:30:00 +0000, Last_Received="2020-03-09 18:10:20 GMT", Minutes_Ago=20, Panel="Panel-11, Status=normal

03/09/2020 18:25:00 +0000, Last_Received="2020-03-09 18:10:20 GMT", Minutes_Ago=20, Panel="Panel-1, Status=normal
03/09/2020 18:25:00 +0000, Last_Received="2020-03-09 18:10:20 GMT", Minutes_Ago=20, Panel="Panel-2, Status=normal
03/09/2020 18:25:00 +0000, Last_Received="2020-03-09 18:10:20 GMT", Minutes_Ago=20, Panel="Panel-3, Status=normal
03/09/2020 18:25:00 +0000, Last_Received="2020-03-09 18:10:20 GMT", Minutes_Ago=20, Panel="Panel-4, Status=normal
03/09/2020 18:25:00 +0000, Last_Received="2020-03-09 18:10:20 GMT", Minutes_Ago=20, Panel="Panel-5, Status=normal
03/09/2020 18:25:00 +0000, Last_Received="2020-03-09 18:10:20 GMT", Minutes_Ago=20, Panel="Panel-6, Status=normal
03/09/2020 18:25:00 +0000, Last_Received="2020-03-09 18:10:20 GMT", Minutes_Ago=20, Panel="Panel-7, Status=normal
03/09/2020 18:25:00 +0000, Last_Received="2020-03-09 18:10:20 GMT", Minutes_Ago=20, Panel="Panel-8, Status=normal
03/09/2020 18:25:00 +0000, Last_Received="2020-03-09 18:10:20 GMT", Minutes_Ago=20, Panel="Panel-9, Status=degraded
03/09/2020 18:25:00 +0000, Last_Received="2020-03-09 18:10:20 GMT", Minutes_Ago=20, Panel="Panel-10, Status=normal
03/09/2020 18:25:00 +0000, Last_Received="2020-03-09 18:10:20 GMT", Minutes_Ago=20, Panel="Panel-11, Status=normal

I am having some trouble with the final piece of the dynamic alert I need to write an alert that checks every 5 mins and sends an email when there is a change in "Status" within the last 10 minutes. The logic is to generate a "system degraded" alert if any of the panels have a status change from "normal" to "degraded" and a "system returned to normal" alert when the degraded panels have returned to normal AND all panels have a current status of "normal".

The basic alert search I have working currently will fire when there is a change in status but only for an individual panel. I have tried playing around a bit with eventstats and streamstats to get a count but have not been successful. The hard part here is not simply just scheduling an alert that checks the status, I only want an alert when the status has changed. For instance, when a panel initially changes from normal to degraded I want an alert, but I dont want an alert saying it is degraded every 5 minutes until it returns to normal. I only want the initial degraded alert and then an alert when the panel has returned to normal which may be several hours etc.

Here is the current alert search I am working with for testing looks like this:

index=my_alerts earliest=-10m latest=now()
| stats latest(Status) as Latest_Status, earliest(Status) as Previous_Status by Panel
| eval status_change=case(Previous_Status != Latest_Status, "Y", Previous_Status = Latest_Status, "N")
| eval system_status=case(status_change="Y" AND Latest_Status="degraded"), "System is degraded",
(status_change="Y" AND Latest_Status="normal"), "System has returned to normal",
status_change="N", "System status has not changed") | where status_change = "Y"

The email alert message I created simply lists the system_status field using $result.system_status$ - So when it fires it says "System is degraded" or "System has returned to normal". This is working but again it is only by Panel. So, if I have 2 degraded panels and 1 returns to normal, the alert should not fire because the current status for all 11 panels is not normal...

I am sure I am probably overthinking this, or I am missing something really basic... Any assistance would be appreciated..

Thanks!

0 Karma
1 Solution

anmolpatel
Builder

This what you need:

| makeresults 
| eval _raw = "_time, Last_Received, Minutes_Ago, Panel, Status
03/09/2020 18:30:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-1, normal
03/09/2020 18:30:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-2, degraded
03/09/2020 18:30:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-3, normal
03/09/2020 18:30:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-4, degraded
03/09/2020 18:30:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-5, normal
03/09/2020 18:30:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-6, normal
03/09/2020 18:30:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-7, normal
03/09/2020 18:30:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-8, normal
03/09/2020 18:30:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-9, normal
03/09/2020 18:30:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-10, normal
03/09/2020 18:30:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-11, normal
03/09/2020 18:25:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-1, normal
03/09/2020 18:25:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-2, normal
03/09/2020 18:25:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-3, normal
03/09/2020 18:25:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-4, degraded
03/09/2020 18:25:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-5, normal
03/09/2020 18:25:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-6, degraded
03/09/2020 18:25:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-7, normal
03/09/2020 18:25:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-8, normal
03/09/2020 18:25:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-9, degraded
03/09/2020 18:25:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-10, normal
03/09/2020 18:25:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-11, normal" 
| multikv forceheader=1 
| table _time, Last_Received, Minutes_Ago, Panel, Status
| streamstats count by Panel
| where count <= 2
`comment("The above streamstats and where command is so that include only the last two results")`
| stats list(Status) as status by Panel
| eval state = case(match(mvindex(status,0), "normal") AND match(mvindex(status,1), "normal"), "System status has not changed", match(mvindex(status, 0), "normal"), "System has returned to normal", match(mvindex(status, 0), "degraded"), "System is degraded")

Use the list(Status) to get them in the order they arrived

View solution in original post

0 Karma

payl_chdhry
Explorer

sddfdsfdsfdsfdsfdsfd

0 Karma

anmolpatel
Builder

This what you need:

| makeresults 
| eval _raw = "_time, Last_Received, Minutes_Ago, Panel, Status
03/09/2020 18:30:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-1, normal
03/09/2020 18:30:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-2, degraded
03/09/2020 18:30:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-3, normal
03/09/2020 18:30:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-4, degraded
03/09/2020 18:30:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-5, normal
03/09/2020 18:30:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-6, normal
03/09/2020 18:30:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-7, normal
03/09/2020 18:30:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-8, normal
03/09/2020 18:30:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-9, normal
03/09/2020 18:30:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-10, normal
03/09/2020 18:30:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-11, normal
03/09/2020 18:25:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-1, normal
03/09/2020 18:25:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-2, normal
03/09/2020 18:25:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-3, normal
03/09/2020 18:25:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-4, degraded
03/09/2020 18:25:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-5, normal
03/09/2020 18:25:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-6, degraded
03/09/2020 18:25:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-7, normal
03/09/2020 18:25:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-8, normal
03/09/2020 18:25:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-9, degraded
03/09/2020 18:25:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-10, normal
03/09/2020 18:25:00 +0000, 2020-03-09 18:10:20 GMT, 20, Panel-11, normal" 
| multikv forceheader=1 
| table _time, Last_Received, Minutes_Ago, Panel, Status
| streamstats count by Panel
| where count <= 2
`comment("The above streamstats and where command is so that include only the last two results")`
| stats list(Status) as status by Panel
| eval state = case(match(mvindex(status,0), "normal") AND match(mvindex(status,1), "normal"), "System status has not changed", match(mvindex(status, 0), "normal"), "System has returned to normal", match(mvindex(status, 0), "degraded"), "System is degraded")

Use the list(Status) to get them in the order they arrived

0 Karma

jperrySplunk
New Member

Thanks anmolpatel. This is much simpler than what I was trying to do. I haven't worked very much with multivalued fields but it seems pretty powerful. This is very close to what I need, but when creating the state messages, I only want the "System has returned to normal" message to be created when a degraded panel is returned to normal AND all 11 panels have a status of normal, ie if I have 2 degraded panels and only 1 panel returns to normal, the system is still degraded... Is there an easy way to do by maybe counting the mvindex(status,0) panels that = "normal" etc?

0 Karma

anmolpatel
Builder

Here you go, this is from the table command onwards:

 | table _time, Last_Received, Minutes_Ago, Panel, Status
 | streamstats count by Panel
 | where count <= 2
 `comment("The above streamstats and where command is so that include only the last two results")`
 | stats list(Status) as status by Panel
 | eval status =  case(match(mvindex(status,0), "normal") AND match(mvindex(status,1), "normal"), "System status has not changed", match(mvindex(status, 0), "normal"), "System has returned to normal", match(mvindex(status, 0), "degraded"), "System is degraded")
 | eval system_status = case(match(status, "System is degraded"), 0, match(status, "System status has not changed") OR match(status, "System has returned to normal"), 1)
 | eventstats sum(system_status) as sum_status count(Panel) as panel_count
 | eval overall_status = if(sum_status / panel_count != 1, "System status is degraded", "System is running normally")
0 Karma

jperrySplunk
New Member

Thanks anmolpatel! I was able to use this for what I needed.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...