Solved: Alert Suppression Until Value Is Back In Control

sochsenbein · ‎04-16-2019

I have an alert set up to check our output against a lower control limit and a lower alarm limit. Right now it will page out once it goes below the control limit and then the appropriate groups will fix what caused it to drop. Rather than suppressing the alert for X amount of time, is there a way to suppress the alert until the output field goes back in control - in other words, above the control limit?

badarsebard · ‎04-18-2019

Based on the scenario you've presented it sounds like you need to "throttle" your alerts until the metric you're measuring returns to normal levels, instead of a time-based throttling that Splunk provides out of the box. The bad news is this is not something you can achieve with Splunk throttling as both variants (once and per result) are governed by time-based approaches. However, you can do this using a lookup to track the metric you're measuring

I'll use a trivial alert as an example. Suppose I create an alert based on a simple count going above a set threshold of 300. I'd define a search such as:

your search here | stats count

And then create an alert trigger like search count > 300. Then we hit your problem of simple time-based alert throttling. For the more advanced case where we need the result to return to normal before we alert again, we'll need to track the results of each search to see when this happens.

First step is to create a lookup, let's call it threshold and tie it to a CSV file with a count field and a row with a 0 to start.
Next, we'll need to modify the alert logic of the search to do the following:

add the previous count from the lookup
set a trigger field with a boolean based on the current count and previous count
output the current count to the lookup
alter the alert logic to trigger off the trigger field boolean

Here's an example:

your search here | stats count | appendcols [| inputlookup threshold | rename count as previous_count | table count] | eval trigger=if(count>300 AND previous_count<=300, 1, 0) | fields count trigger | outputlookup threshold

Then for the alert logic you'd set a custom alert condition to search trigger=1.

View solution in original post

badarsebard · ‎04-18-2019

Based on the scenario you've presented it sounds like you need to "throttle" your alerts until the metric you're measuring returns to normal levels, instead of a time-based throttling that Splunk provides out of the box. The bad news is this is not something you can achieve with Splunk throttling as both variants (once and per result) are governed by time-based approaches. However, you can do this using a lookup to track the metric you're measuring

I'll use a trivial alert as an example. Suppose I create an alert based on a simple count going above a set threshold of 300. I'd define a search such as:

your search here | stats count

And then create an alert trigger like search count > 300. Then we hit your problem of simple time-based alert throttling. For the more advanced case where we need the result to return to normal before we alert again, we'll need to track the results of each search to see when this happens.

First step is to create a lookup, let's call it threshold and tie it to a CSV file with a count field and a row with a 0 to start.
Next, we'll need to modify the alert logic of the search to do the following:

add the previous count from the lookup
set a trigger field with a boolean based on the current count and previous count
output the current count to the lookup
alter the alert logic to trigger off the trigger field boolean

Here's an example:

your search here | stats count | appendcols [| inputlookup threshold | rename count as previous_count | table count] | eval trigger=if(count>300 AND previous_count<=300, 1, 0) | fields count trigger | outputlookup threshold

Then for the alert logic you'd set a custom alert condition to search trigger=1.

sochsenbein · ‎04-18-2019

@badarsebard I think this will work perfectly. I will set it up this/next week and then come back and mark this as the answer when I've got it going. I appreciate the time you took out of your day to reply!

arlington · ‎02-19-2020

Shouldn't it be ... table **previous_count* ...* ?

your search here | stats count | appendcols [| inputlookup threshold | rename count as previous_count | table previous_count ] | eval trigger=if(count>300 AND previous_count<=300, 1, 0) | fields count trigger | outputlookup threshold

sochsenbein · ‎04-29-2019

@badarsebard this technique works perfect! Thanks for your time!

sochsenbein · ‎04-18-2019

@pruthvikrishnapolavarapu right, we have the alert set up so that if our output goes below the control limit, then it will return an event, which will then trigger the alert. What we are wanting is to throttle the alerts from paging out again and reset once the output goes back above the control limit. I don't think there is a native way to do this since it's basically asking Splunk to keep persistent data, but was curious if someone has done something similar. One thought I had is to create a table in our database with a flag. So, once the alert triggers for the first time, it'll check the flag. If it's false, it'll send an alert, if it's not, it won't do anything. This is just one thought I had.

pruthvikrishnap · ‎04-16-2019

Hi Soach,

This can be done by alert throttling, figure out the field and set a limit when you want the alert to trigger.
Using this you can set throttling for multiple fields.
https://docs.splunk.com/Documentation/Splunk/6.1.1/Alert/Defineper-resultalerts#Set_up_throttling_fo...

Alert Suppression Until Value Is Back In Control

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Deep Dive: Accelerate threat investigation with Splunk’s AI Assistant in Security

Announcing Modern Navigation: A New Era of Splunk User Experience

Detection Engineering Office Hours: Real-World Troubleshooting & Q&A

Join the Conversation