Alerting

Alert that triggers based on errors per minute over time

karlduncans
Engager

I'm looking for a way to make an alert trigger only if a certain amount of events occur within a 3 minute period, per minute.

Right now:

index=stuff earliest=-4m@m latest=-1m@m
| bucket span=1m _time
| stats count by _time errormsg

Running every 3 minutes, and alert set to trigger if events > 5

Problem is, if there is a single minute spike of errors that exceeds 5 errors within that 3 minute search period, the alert will trigger. I'm looking for help in having the alert trigger only if each minute during that 3 minute period exceeds 5 errors.

Thanks in advance!

Tags (1)
0 Karma
1 Solution

chanfoli
Builder

If your threshold is total errors per minute based you can do something like this:

earliest=-4m@m latest=-1m@m
| bucket span=1m _time
| stats count  as TotalErrCount values(errmsg) by _time | where TotalErrCount>5

Then alert if you get 3 results. If you are wanting to total each specific message and look for cases where there is a specific errmsg with >5 for all 3 minutes then this should do it if you alert on getting 3 results, it would take some refinement to provide per message counts in the results, the goal here was to give 3 results only if at least one specific errmesg occurred more than 5 times in each minute:

earliest=-4m@m latest=-1m@m
| bucket span=1m _time
| stats count  as ErrCount by _time errmesg | where ErrCount>5 | stats values(errmesg) by _time

View solution in original post

chanfoli
Builder

If your threshold is total errors per minute based you can do something like this:

earliest=-4m@m latest=-1m@m
| bucket span=1m _time
| stats count  as TotalErrCount values(errmsg) by _time | where TotalErrCount>5

Then alert if you get 3 results. If you are wanting to total each specific message and look for cases where there is a specific errmsg with >5 for all 3 minutes then this should do it if you alert on getting 3 results, it would take some refinement to provide per message counts in the results, the goal here was to give 3 results only if at least one specific errmesg occurred more than 5 times in each minute:

earliest=-4m@m latest=-1m@m
| bucket span=1m _time
| stats count  as ErrCount by _time errmesg | where ErrCount>5 | stats values(errmesg) by _time

karlduncans
Engager

Your 2nd query is what i was looking for. Thank you!

0 Karma

chanfoli
Builder

So are you trying to alert when the overall count of errors exceeds 5 for all 3 minutes or do the specific different errormsg values need to factor in?

0 Karma

pradeepkumarg
Influencer

Does your search always results in 3 rows? If so, you can try something like this

index=stuff earliest=-4m@m latest=-1m@m 
| bucket span=1m _time
| stats count by _time errormsg | eval flag = if(count >5, 1, 0) | eventstats sum(flag) as total | search total = 3
0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...