Alerting

Alert that triggers based on errors per minute over time

karlduncans
Engager

I'm looking for a way to make an alert trigger only if a certain amount of events occur within a 3 minute period, per minute.

Right now:

index=stuff earliest=-4m@m latest=-1m@m
| bucket span=1m _time
| stats count by _time errormsg

Running every 3 minutes, and alert set to trigger if events > 5

Problem is, if there is a single minute spike of errors that exceeds 5 errors within that 3 minute search period, the alert will trigger. I'm looking for help in having the alert trigger only if each minute during that 3 minute period exceeds 5 errors.

Thanks in advance!

Tags (1)
0 Karma
1 Solution

chanfoli
Builder

If your threshold is total errors per minute based you can do something like this:

earliest=-4m@m latest=-1m@m
| bucket span=1m _time
| stats count  as TotalErrCount values(errmsg) by _time | where TotalErrCount>5

Then alert if you get 3 results. If you are wanting to total each specific message and look for cases where there is a specific errmsg with >5 for all 3 minutes then this should do it if you alert on getting 3 results, it would take some refinement to provide per message counts in the results, the goal here was to give 3 results only if at least one specific errmesg occurred more than 5 times in each minute:

earliest=-4m@m latest=-1m@m
| bucket span=1m _time
| stats count  as ErrCount by _time errmesg | where ErrCount>5 | stats values(errmesg) by _time

View solution in original post

chanfoli
Builder

If your threshold is total errors per minute based you can do something like this:

earliest=-4m@m latest=-1m@m
| bucket span=1m _time
| stats count  as TotalErrCount values(errmsg) by _time | where TotalErrCount>5

Then alert if you get 3 results. If you are wanting to total each specific message and look for cases where there is a specific errmsg with >5 for all 3 minutes then this should do it if you alert on getting 3 results, it would take some refinement to provide per message counts in the results, the goal here was to give 3 results only if at least one specific errmesg occurred more than 5 times in each minute:

earliest=-4m@m latest=-1m@m
| bucket span=1m _time
| stats count  as ErrCount by _time errmesg | where ErrCount>5 | stats values(errmesg) by _time

karlduncans
Engager

Your 2nd query is what i was looking for. Thank you!

0 Karma

chanfoli
Builder

So are you trying to alert when the overall count of errors exceeds 5 for all 3 minutes or do the specific different errormsg values need to factor in?

0 Karma

pradeepkumarg
Influencer

Does your search always results in 3 rows? If so, you can try something like this

index=stuff earliest=-4m@m latest=-1m@m 
| bucket span=1m _time
| stats count by _time errormsg | eval flag = if(count >5, 1, 0) | eventstats sum(flag) as total | search total = 3
0 Karma
Get Updates on the Splunk Community!

Registration for Splunk University is Now Open!

Are you ready for an adventure in learning?   Brace yourselves because Splunk University is back, and it's ...

Splunkbase | Splunk Dashboard Examples App for SimpleXML End of Life

The Splunk Dashboard Examples App for SimpleXML will reach end of support on Dec 19, 2024, after which no new ...

Understanding Generative AI Techniques and Their Application in Cybersecurity

Watch On-Demand Artificial intelligence is the talk of the town nowadays, with industries of all kinds ...