Alerting

How to set up alert when error count of latest week is greater than average of all weeks in past 30 days?

allladin101
Explorer

Hi All,

I want to check if there is a way by which, I could set up an alert when the error count of the latest week is greater than the mean of all the weeks in the past 30 days. My current query is:

index=tms_uat* ERR  earliest=-30d@d latest=-0d@d tms_logcat="ERR-*" NOT ("SSL Error: Error on Read errno 104*")| timechart span=7d count by tms_logcat limit=40

Can someone please help me.

Tags (3)
0 Karma
1 Solution

strive
Influencer

With out timewrap app, if you need an answer

try this

<Some Search Terms...> earliest=-4w@w latest=-0w@w | bucket _time span=1w | stats count as TotalErrors by _time | eventstats mean(TotalErrors) as Mean | sort 1 -_time | eval alertCode = if(TotalErrors>Mean,1,0)

This is strictly based on last four weeks (time is snapped to week). It wont consider data for current week.

In your case if last week means 'last 7 days ignoring today' then change the earliest to -28d@d, latest as -0d@d, span as 7d.

As my earlier comment, it is not right to run searches on raw data if log volume is very high.

View solution in original post

strive
Influencer

With out timewrap app, if you need an answer

try this

<Some Search Terms...> earliest=-4w@w latest=-0w@w | bucket _time span=1w | stats count as TotalErrors by _time | eventstats mean(TotalErrors) as Mean | sort 1 -_time | eval alertCode = if(TotalErrors>Mean,1,0)

This is strictly based on last four weeks (time is snapped to week). It wont consider data for current week.

In your case if last week means 'last 7 days ignoring today' then change the earliest to -28d@d, latest as -0d@d, span as 7d.

As my earlier comment, it is not right to run searches on raw data if log volume is very high.

MuS
SplunkTrust
SplunkTrust

Hi alladin101,

this is another good use case for the timewrap app. Take this run everywhere command and adapt it to your needs:

index=_internal source=*metrics.log earliest=-30d@d 
| timechart span=1w count 
| timewrap w series=short 
| eval mean=(s1+s2+s3)/3 
| where s0 > mean

The timechart will count events for each week, timewrap will group each week into new fields called s0, s1 ...., the eval will calculate the mean of the last three weeks and the where will check if the lastet week event count is higher than the mean.

But remember, depending on the event count this can take some time to complete.

hope this helps to get you started ...

cheers, MuS

0 Karma

allladin101
Explorer

Volume is not even distributed, but we may says its mostly high.

Not using any summary index yet.

0 Karma

strive
Influencer

what is your log volume? If your log volume is high, then it is not right to execute the search on last weeks raw data.

Are you summarizing data and storing it in some summary index?

0 Karma
Get Updates on the Splunk Community!

Routing Data to Different Splunk Indexes in the OpenTelemetry Collector

This blog post is part of an ongoing series on OpenTelemetry. The OpenTelemetry project is the second largest ...

Getting Started with AIOps: Event Correlation Basics and Alert Storm Detection in ...

Getting Started with AIOps:Event Correlation Basics and Alert Storm Detection in Splunk IT Service ...

Register to Attend BSides SPL 2022 - It's all Happening October 18!

Join like-minded individuals for technical sessions on everything Splunk!  This is a community-led and run ...