I have an alert set up to check our output against a lower control limit and a lower alarm limit. Right now it will page out once it goes below the control limit and then the appropriate groups will fix what caused it to drop. Rather than suppressing the alert for X amount of time, is there a way to suppress the alert until the output field goes back in control - in other words, above the control limit?
Based on the scenario you've presented it sounds like you need to "throttle" your alerts until the metric you're measuring returns to normal levels, instead of a time-based throttling that Splunk provides out of the box. The bad news is this is not something you can achieve with Splunk throttling as both variants (once and per result) are governed by time-based approaches. However, you can do this using a lookup to track the metric you're measuring
I'll use a trivial alert as an example. Suppose I create an alert based on a simple count going above a set threshold of 300. I'd define a search such as:
your search here | stats count
And then create an alert trigger like search count > 300
. Then we hit your problem of simple time-based alert throttling. For the more advanced case where we need the result to return to normal before we alert again, we'll need to track the results of each search to see when this happens.
First step is to create a lookup, let's call it threshold
and tie it to a CSV file with a count
field and a row with a 0 to start.
Next, we'll need to modify the alert logic of the search to do the following:
trigger
field with a boolean based on the current count and previous counttrigger
field booleanHere's an example:
your search here | stats count | appendcols [| inputlookup threshold | rename count as previous_count | table count] | eval trigger=if(count>300 AND previous_count<=300, 1, 0) | fields count trigger | outputlookup threshold
Then for the alert logic you'd set a custom alert condition to search trigger=1
.
Based on the scenario you've presented it sounds like you need to "throttle" your alerts until the metric you're measuring returns to normal levels, instead of a time-based throttling that Splunk provides out of the box. The bad news is this is not something you can achieve with Splunk throttling as both variants (once and per result) are governed by time-based approaches. However, you can do this using a lookup to track the metric you're measuring
I'll use a trivial alert as an example. Suppose I create an alert based on a simple count going above a set threshold of 300. I'd define a search such as:
your search here | stats count
And then create an alert trigger like search count > 300
. Then we hit your problem of simple time-based alert throttling. For the more advanced case where we need the result to return to normal before we alert again, we'll need to track the results of each search to see when this happens.
First step is to create a lookup, let's call it threshold
and tie it to a CSV file with a count
field and a row with a 0 to start.
Next, we'll need to modify the alert logic of the search to do the following:
trigger
field with a boolean based on the current count and previous counttrigger
field booleanHere's an example:
your search here | stats count | appendcols [| inputlookup threshold | rename count as previous_count | table count] | eval trigger=if(count>300 AND previous_count<=300, 1, 0) | fields count trigger | outputlookup threshold
Then for the alert logic you'd set a custom alert condition to search trigger=1
.
@badarsebard I think this will work perfectly. I will set it up this/next week and then come back and mark this as the answer when I've got it going. I appreciate the time you took out of your day to reply!
Shouldn't it be ... table **previous_count* ...* ?
your search here | stats count | appendcols [| inputlookup threshold | rename count as previous_count | table previous_count ] | eval trigger=if(count>300 AND previous_count<=300, 1, 0) | fields count trigger | outputlookup threshold
@badarsebard this technique works perfect! Thanks for your time!
@pruthvikrishnapolavarapu right, we have the alert set up so that if our output goes below the control limit, then it will return an event, which will then trigger the alert. What we are wanting is to throttle the alerts from paging out again and reset once the output goes back above the control limit. I don't think there is a native way to do this since it's basically asking Splunk to keep persistent data, but was curious if someone has done something similar. One thought I had is to create a table in our database with a flag. So, once the alert triggers for the first time, it'll check the flag. If it's false, it'll send an alert, if it's not, it won't do anything. This is just one thought I had.
Hi Soach,
This can be done by alert throttling, figure out the field and set a limit when you want the alert to trigger.
Using this you can set throttling for multiple fields.
https://docs.splunk.com/Documentation/Splunk/6.1.1/Alert/Defineper-resultalerts#Set_up_throttling_fo...