Alerting

Alert that triggers based on errors per minute over time

karlduncans
Engager

I'm looking for a way to make an alert trigger only if a certain amount of events occur within a 3 minute period, per minute.

Right now:

index=stuff earliest=-4m@m latest=-1m@m
| bucket span=1m _time
| stats count by _time errormsg

Running every 3 minutes, and alert set to trigger if events > 5

Problem is, if there is a single minute spike of errors that exceeds 5 errors within that 3 minute search period, the alert will trigger. I'm looking for help in having the alert trigger only if each minute during that 3 minute period exceeds 5 errors.

Thanks in advance!

Tags (1)
0 Karma
1 Solution

chanfoli
Builder

If your threshold is total errors per minute based you can do something like this:

earliest=-4m@m latest=-1m@m
| bucket span=1m _time
| stats count  as TotalErrCount values(errmsg) by _time | where TotalErrCount>5

Then alert if you get 3 results. If you are wanting to total each specific message and look for cases where there is a specific errmsg with >5 for all 3 minutes then this should do it if you alert on getting 3 results, it would take some refinement to provide per message counts in the results, the goal here was to give 3 results only if at least one specific errmesg occurred more than 5 times in each minute:

earliest=-4m@m latest=-1m@m
| bucket span=1m _time
| stats count  as ErrCount by _time errmesg | where ErrCount>5 | stats values(errmesg) by _time

View solution in original post

chanfoli
Builder

If your threshold is total errors per minute based you can do something like this:

earliest=-4m@m latest=-1m@m
| bucket span=1m _time
| stats count  as TotalErrCount values(errmsg) by _time | where TotalErrCount>5

Then alert if you get 3 results. If you are wanting to total each specific message and look for cases where there is a specific errmsg with >5 for all 3 minutes then this should do it if you alert on getting 3 results, it would take some refinement to provide per message counts in the results, the goal here was to give 3 results only if at least one specific errmesg occurred more than 5 times in each minute:

earliest=-4m@m latest=-1m@m
| bucket span=1m _time
| stats count  as ErrCount by _time errmesg | where ErrCount>5 | stats values(errmesg) by _time

karlduncans
Engager

Your 2nd query is what i was looking for. Thank you!

0 Karma

chanfoli
Builder

So are you trying to alert when the overall count of errors exceeds 5 for all 3 minutes or do the specific different errormsg values need to factor in?

0 Karma

pradeepkumarg
Influencer

Does your search always results in 3 rows? If so, you can try something like this

index=stuff earliest=-4m@m latest=-1m@m 
| bucket span=1m _time
| stats count by _time errormsg | eval flag = if(count >5, 1, 0) | eventstats sum(flag) as total | search total = 3
0 Karma
Get Updates on the Splunk Community!

Splunk Observability Cloud | Unified Identity - Now Available for Existing Splunk ...

Raise your hand if you’ve already forgotten your username or password when logging into an account. (We can’t ...

Index This | How many sides does a circle have?

February 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

Registration for Splunk University is Now Open!

Are you ready for an adventure in learning?   Brace yourselves because Splunk University is back, and it's ...