Re: How can I create an alert that monitors for er...

Cheng2Ready

Hi guys just need some brain picking
How can I create an alert that monitors for errors that persist for more than 2 minutes then trigger?

index=your_logs "error"
| bin _time span=1m
| stats count as error_count by _time
| streamstats window=2 current=t count(error_count) as consecutive_error_minutes
| where consecutive_error_minutes >= 2
| stats count as alert_trigger

but when I

Time Range: Set to Last 5 minutes
Cron Schedule: Run it every 5 minutes (e.g., */5 * * * *).
Trigger Condition: Set to Number of Results > 0.
Cron Schedule: */5 * * * * (Runs every 5 minutes).
Time Range:
- Earliest: -7m
- Latest: -2m
  
  For some reason the Alert just triggered with
  Alert Trigger = 0
  Not sure what went wrong?

PickleRick

Adding to what has already been said it seems you'd be best off with just streamstats but with a time window.

index=your_logs "error"
| streamstats time_window=2m count values(_time) as _time
| where count>=2

If you have multiple types of error and want to check each of them separately you can add some form of a "by" clause to streamstats like

by errorcode

One caveat - for long-lasting errors it will give you several results from each subsequent error event.

Cheng2Ready

Thank you Ill give this a try

isoutamo

Then you need to understand what you are meaning with inside two minutes.
Is this meaning as xx:y1:zz or is this meaning that event has happened within two minute time slot counting ms too? If 1st is enough then bin is correct answer but if it’s 2nd then you need something like stats + range. And try always use first stats instead of *stats as this way you can utilize indexes parallelism (map + reduce) and get better response time and utilize less resources!

bowesmana

I think your logic is flawed. If there are gaps with minutes with no error, then they will not be "consecutive" minutes, just adjacent, so if you have errors at 8:01 and 8:03, you will get a count of 2 consecutive errors, which I assume is not what you want.

You would be better off using timechart, as that will give you populated values for each time interval - see this example using timechart and a changed streamstats - run this example with a time range of last 60 minutes and you can see the effect. Comment out the last line to see how the count is calculated

| makeresults count=20
| eval _time=now() - ((random() % 30) * 60)
| timechart span=1m count as error_count 
| streamstats window=2 current=t count(eval(error_count>0)) as consecutive_error_minutes
| where consecutive_error_minutes >= 2

This will return you a list of minutes where the consecutive error count was >=2.

Note that this will remove the first of the minutes when the error first occurred, as streamstats will record that as a 1 error count, so the results will not include the first minute of the error. Again, is this what you want?

Hopefully this helps, but add any extra detail if this does not get you to where you want to get to.

Cheng2Ready

That is a Good Point!

what should happen is
8:01 and 8:03 does not trigger missing 8:02 since no event was log for error 1min missing
but
7:57 , 7:58 and 7:59 should trigger from 57 min>59 that was 2min so should trigger

hope this helps?

bowesmana

And another small, but significant issue is that you will "miss" consecutive errors that occur on your search boundary, e.g. your search runs at 0, 5, 10... and searches 53-58, 58-03, 03-08...

But as you're requiring a count of 2 or more, you're only actually looking at 4 possible minutes.

So, if you have errors at 03 and 04, you will never see that as 2 consecutive errors. So, you want your search window to be -8 to -2, so there is a 1 minute overlap, so you're using a full 5 minute window for >1 error count.

richgalloway

When the query ends with stats count, it will always return one result. Therefore, Number of Results > 0 will always trigger the alert. Add a where command to the alert so it only returns results if there are consecutive errors.

index=your_logs "error"
| bin _time span=1m
| stats count as error_count by _time
| streamstats window=2 current=t count(error_count) as consecutive_error_minutes
| where consecutive_error_minutes >= 2
| stats count as alert_trigger
| where alert_trigger > 0

That said, I have doubts about the methodology used. The current query will trigger if two consecutive errors are detected, but what if they're different errors? Does it matter? I would think that two different errors would not be considered "persistence".

---
If this reply helps you, Karma would be appreciated.

Cheng2Ready

what should happen is
8:01 and 8:03 does not trigger missing 8:02 since no event was log for error 1min missing
but
7:57 , 7:58 and 7:59 should trigger from 57 min>59 that was 2min so should trigger

Cheng2Ready

what if they're different errors?
Ive filtered out my search to only look for 1 type of error
Does it matter?
no

Thank you ill give this a try

How can I create an alert that monitors for errors that persist for more than 2 minutes then trigger?

other

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

What Is the Name of the USB Key Inserted by Bob Smith? (BOTS Hint, Not the Answer)

Automating Threat Operations and Threat Hunting with Recorded Future

Join the Conversation