Alerting

Alert that triggers based on errors per minute over time

karlduncans
Engager

I'm looking for a way to make an alert trigger only if a certain amount of events occur within a 3 minute period, per minute.

Right now:

index=stuff earliest=-4m@m latest=-1m@m
| bucket span=1m _time
| stats count by _time errormsg

Running every 3 minutes, and alert set to trigger if events > 5

Problem is, if there is a single minute spike of errors that exceeds 5 errors within that 3 minute search period, the alert will trigger. I'm looking for help in having the alert trigger only if each minute during that 3 minute period exceeds 5 errors.

Thanks in advance!

Tags (1)
0 Karma
1 Solution

chanfoli
Builder

If your threshold is total errors per minute based you can do something like this:

earliest=-4m@m latest=-1m@m
| bucket span=1m _time
| stats count  as TotalErrCount values(errmsg) by _time | where TotalErrCount>5

Then alert if you get 3 results. If you are wanting to total each specific message and look for cases where there is a specific errmsg with >5 for all 3 minutes then this should do it if you alert on getting 3 results, it would take some refinement to provide per message counts in the results, the goal here was to give 3 results only if at least one specific errmesg occurred more than 5 times in each minute:

earliest=-4m@m latest=-1m@m
| bucket span=1m _time
| stats count  as ErrCount by _time errmesg | where ErrCount>5 | stats values(errmesg) by _time

View solution in original post

chanfoli
Builder

If your threshold is total errors per minute based you can do something like this:

earliest=-4m@m latest=-1m@m
| bucket span=1m _time
| stats count  as TotalErrCount values(errmsg) by _time | where TotalErrCount>5

Then alert if you get 3 results. If you are wanting to total each specific message and look for cases where there is a specific errmsg with >5 for all 3 minutes then this should do it if you alert on getting 3 results, it would take some refinement to provide per message counts in the results, the goal here was to give 3 results only if at least one specific errmesg occurred more than 5 times in each minute:

earliest=-4m@m latest=-1m@m
| bucket span=1m _time
| stats count  as ErrCount by _time errmesg | where ErrCount>5 | stats values(errmesg) by _time

karlduncans
Engager

Your 2nd query is what i was looking for. Thank you!

0 Karma

chanfoli
Builder

So are you trying to alert when the overall count of errors exceeds 5 for all 3 minutes or do the specific different errormsg values need to factor in?

0 Karma

pradeepkumarg
Influencer

Does your search always results in 3 rows? If so, you can try something like this

index=stuff earliest=-4m@m latest=-1m@m 
| bucket span=1m _time
| stats count by _time errormsg | eval flag = if(count >5, 1, 0) | eventstats sum(flag) as total | search total = 3
0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Event Series: Telemetry Pipeline Management

Balancing Scale and Spend: Gaining Control Over High-Volume Metrics in Splunk Observability Cloud As ...

Kick the Tires Before You Commit: A Hands-On Tour of the Splunk Observability Cloud ...

Evaluating an enterprise observability platform usually goes like this: fill out a form, get a free trial with ...

Deep insights, no barriers: Splunk Observability Cloud Free Edition

As software delivery cycles continue to accelerate, observability shouldn’t be a luxury — it should be a ...