Alerting

I want to know if my alert condition is possible

magilbert1
Explorer

Hi

I'm trying to create an alert that will be triggered if I have errors every 5 minutes for 30 minutes.

I'm not sure if that's possible.

Thanks for your help.

Tags (1)
0 Karma
1 Solution

woodcock
Esteemed Legend

Like this:

index=youShouldAlwaysSpecifyAnIndex AND sourcetype=AndSourcetypeToo earliest=-30m@m latest=now
| bin _time span=5m
| stats count BY _time application
| stats count BY application
| where count >= 6

View solution in original post

0 Karma

woodcock
Esteemed Legend

Like this:

index=youShouldAlwaysSpecifyAnIndex AND sourcetype=AndSourcetypeToo earliest=-30m@m latest=now
| bin _time span=5m
| stats count BY _time application
| stats count BY application
| where count >= 6
0 Karma

magilbert1
Explorer

It's seems to work.

But why my count result are all at 7
That count result should not exceed 6 ? ( 30min / 5min = 6 )

0 Karma

woodcock
Esteemed Legend

Sometimes it will be 6 and sometimes 7 because the 5-minute periods might be like this:

now=5:58, -30m@m=5:28
bin1=5:25-5:30
bin2=5:30-5:35
bin3=5:35-5:40
bin4=5:40-5:45
bin5=5:45-5:50
bin6=5:50-5:55
bin7=5:55-6:00

So you can add |head 6 or |tail 6 to trim the partial bin from one side or the other.

0 Karma

magilbert1
Explorer

Ok thank you very much that helps me a lot

0 Karma

magilbert1
Explorer

I have now these two differents searches for my problem.
But i can't figure it out how to count the number of 5min windows that have 1 or more errors.

index="MyIndex" earliest=-30m@m latest=@m | bin _time span=5m | stats count by _time | where count >0

index="Myindex" earliest=-30m@m latest=@m | streamstats time_window=5m count | where count > 0

0 Karma

acharlieh
Influencer

Assuming every event in Myindex is an error... (if not you need to adjust the search prior to the first pipe)...

index="MyIndex" earliest=-30m@m latest=@m | bin _time span=5m | stats count by _time | where count >0

gives you one result for each 5 minute window that has at least 1 error so:

index="MyIndex" earliest=-30m@m latest=@m | bin _time span=5m | stats count by _time | where count >0 | stats count

would then give you the number of 5 minute windows with at least 1 error.

0 Karma

magilbert1
Explorer

I also need to make sure that the errors come from the same application.
I mean I can have two application errors log that can make a total of 6 5-minute windows where there is an error but this case should not trigger an alert.

0 Karma

acharlieh
Influencer

That's exactly why I suggested bringing an additional dimension/field through both stats commands in my answer. You have a field in your events identifying application, you need to split by that field too

0 Karma

woodcock
Esteemed Legend

I would use this, to avoid even buckets:

index=myIndex earliest=-30m@m latest=@m | streamstats time_window=5m count BY application | where count > 0
0 Karma

magilbert1
Explorer

If I have the correct understanding of the query.

This will output only if I have more than 0 result in 5mins window.
but how I know that I have errors non-stop for 30 minutes.
I need something to count how many 5 minutes windows wich I have 1 or more errors ?
I need to know that I have 6 windows in 30 minutes with errors in it.

0 Karma

woodcock
Esteemed Legend

Good point, this will not work for that case, let me put in a new answer.

0 Karma

acharlieh
Influencer

Yes it's possible... Your base search should look for errors, and need your search time window to be 30 minutes wide. ( earliest=-30m@m latest=@m or something similar)

Then you'd use bin to bucket up your _time value to every 5 minutes.

You can get the errors by every 5 minute bucket with: stats count by _time,

Then keep only those where you have 5 errors or more per bucket with where count >= 5

Repeating similar processes without time, you can now get the number of timespans with 5 or more errors with: stats count by <other dimensions like host?>

and then where count = 6 to get down to those other dimensions with 5 errors every 5 minutes. (because 6*5min = 30min... but check me on this as off by one errors is one of the two hard problems in computer science, along with cache invalidation and naming things)

That's essentially the outline of the search to do this.

0 Karma

magilbert1
Explorer

Okay Thanks I'll try this.

0 Karma

magilbert1
Explorer

Will it needs to use a subsearch ? ( to have 2 "By" close )
Can you give me a structure example ?

For nom I have : index=myIndex earliest=-30m@m latest=@m | bin _time span=5m | stats count by _time | where count > 0

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...