Alerting

Alert suppression: List throttled/suppressed field values

ledj
Engager

Hi,

I had the situation that I wanted to know why an alert wasn't fired for a resource. Therefore I was looking which field values (don't know how to describe it better) are currently stored in Splunk for suppressing there alert action to be executed.

To make it better understandable what I mean, here a short fictive example:

Use Case: Monitoring of CPU usage of hosts. When the CPU usage hits the 80% threshold fire an alert and throttle alert for 1 hour, based on host field.

Question: How can I determine which for which hosts the alert is throttled.

Note: I'm interested in the throttling list the alert uses. Not in approaches that evaluate the CPU usage events.

Thank you in advance.
Jens

Labels (2)
0 Karma

jwelch_splunk
Splunk Employee
Splunk Employee

You have a scheduled search that runs at specific times

 

index=_internal source=*scheduler.log host=shc_hosts here or standalone_sh_name_here savedsearch_name="*name of search here*"     for the time range in question.

Fields to look at 

result_count  Where results found?

status did the search run or get skipped

alert_actions  if yes then it most likely worked unless you had a large result set perhaps some did and some where throttled

suppression Was one or more things throttled from your result set

This will tell you if the search found results / if it executed an alert action / if something in the results was suppressed (aka throttled)

There are csv suppression files in /opt/splunk/var/run/splunk/scheduler/suppression/

These are the files that are checked to determine if something is throttled the issue here is that most results in this data have a hash of the fields your are suppressing on, so you can decode them.  But you can see other information about the throttles

 

key,expire,ACTION,MD5
"admin;SA-Utils;Audit - Script Errors;1c40161ea84755387fddbdfd9babb74e;",1594015219,ADD,0E4E58D12C9EC1325A555B91A142A336

Maybe this can help you out some?

The real question 

ledj
Engager

Thanks for your answer.

This will help, when we want to check the current status of throttled values. But what I forgot to mention is that I also want to see a historic status of no longer throttled values.

My real use case was to understand why an alert fired for some, but not all expected results. And at a time where the throttling was already outdated. 

Your approach is very nice, but cumbersome to use in daily business, as there is no direct way to get information about the status of throttling. (Need of hashing values to look up in throttling CSV files)

Nevertheless I learned some new stuff. Thank you for that. 🙂

As a possible workaround, for this information not to be directly accessible in Splunk, I got a hint to use the alert action Log Event to write the desired  information into an index. 

0 Karma

thambisetty
SplunkTrust
SplunkTrust

I don't think this is something we can get in any way in Splunk.

————————————
If this helps, give a like below.
Get Updates on the Splunk Community!

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...

3-2-1 Go! How Fast Can You Debug Microservices with Observability Cloud?

Register Join this Tech Talk to learn how unique features like Service Centric Views, Tag Spotlight, and ...