Alerting

Alert Suppression Until Value Is Back In Control

sochsenbein
Communicator

I have an alert set up to check our output against a lower control limit and a lower alarm limit. Right now it will page out once it goes below the control limit and then the appropriate groups will fix what caused it to drop. Rather than suppressing the alert for X amount of time, is there a way to suppress the alert until the output field goes back in control - in other words, above the control limit?

Tags (2)
0 Karma
1 Solution

badarsebard
Communicator

Based on the scenario you've presented it sounds like you need to "throttle" your alerts until the metric you're measuring returns to normal levels, instead of a time-based throttling that Splunk provides out of the box. The bad news is this is not something you can achieve with Splunk throttling as both variants (once and per result) are governed by time-based approaches. However, you can do this using a lookup to track the metric you're measuring

I'll use a trivial alert as an example. Suppose I create an alert based on a simple count going above a set threshold of 300. I'd define a search such as:

your search here | stats count

And then create an alert trigger like search count > 300. Then we hit your problem of simple time-based alert throttling. For the more advanced case where we need the result to return to normal before we alert again, we'll need to track the results of each search to see when this happens.

First step is to create a lookup, let's call it threshold and tie it to a CSV file with a count field and a row with a 0 to start.
Next, we'll need to modify the alert logic of the search to do the following:

  • add the previous count from the lookup
  • set a trigger field with a boolean based on the current count and previous count
  • output the current count to the lookup
  • alter the alert logic to trigger off the trigger field boolean

Here's an example:

your search here | stats count | appendcols [| inputlookup threshold | rename count as previous_count | table count] | eval trigger=if(count>300 AND previous_count<=300, 1, 0) | fields count trigger | outputlookup threshold

Then for the alert logic you'd set a custom alert condition to search trigger=1.

View solution in original post

badarsebard
Communicator

Based on the scenario you've presented it sounds like you need to "throttle" your alerts until the metric you're measuring returns to normal levels, instead of a time-based throttling that Splunk provides out of the box. The bad news is this is not something you can achieve with Splunk throttling as both variants (once and per result) are governed by time-based approaches. However, you can do this using a lookup to track the metric you're measuring

I'll use a trivial alert as an example. Suppose I create an alert based on a simple count going above a set threshold of 300. I'd define a search such as:

your search here | stats count

And then create an alert trigger like search count > 300. Then we hit your problem of simple time-based alert throttling. For the more advanced case where we need the result to return to normal before we alert again, we'll need to track the results of each search to see when this happens.

First step is to create a lookup, let's call it threshold and tie it to a CSV file with a count field and a row with a 0 to start.
Next, we'll need to modify the alert logic of the search to do the following:

  • add the previous count from the lookup
  • set a trigger field with a boolean based on the current count and previous count
  • output the current count to the lookup
  • alter the alert logic to trigger off the trigger field boolean

Here's an example:

your search here | stats count | appendcols [| inputlookup threshold | rename count as previous_count | table count] | eval trigger=if(count>300 AND previous_count<=300, 1, 0) | fields count trigger | outputlookup threshold

Then for the alert logic you'd set a custom alert condition to search trigger=1.

View solution in original post

sochsenbein
Communicator

@badarsebard I think this will work perfectly. I will set it up this/next week and then come back and mark this as the answer when I've got it going. I appreciate the time you took out of your day to reply!

0 Karma

arlington
Explorer

Shouldn't it be ... table **previous_count* ...* ?

your search here | stats count | appendcols [| inputlookup threshold | rename count as previous_count | table previous_count ] | eval trigger=if(count>300 AND previous_count<=300, 1, 0) | fields count trigger | outputlookup threshold
0 Karma

sochsenbein
Communicator

@badarsebard this technique works perfect! Thanks for your time!

0 Karma

sochsenbein
Communicator

@pruthvikrishnapolavarapu right, we have the alert set up so that if our output goes below the control limit, then it will return an event, which will then trigger the alert. What we are wanting is to throttle the alerts from paging out again and reset once the output goes back above the control limit. I don't think there is a native way to do this since it's basically asking Splunk to keep persistent data, but was curious if someone has done something similar. One thought I had is to create a table in our database with a flag. So, once the alert triggers for the first time, it'll check the flag. If it's false, it'll send an alert, if it's not, it won't do anything. This is just one thought I had.

0 Karma

pruthvikrishnap
Contributor

Hi Soach,

This can be done by alert throttling, figure out the field and set a limit when you want the alert to trigger.
Using this you can set throttling for multiple fields.
https://docs.splunk.com/Documentation/Splunk/6.1.1/Alert/Defineper-resultalerts#Set_up_throttling_fo...

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.