Alerting

How to set up alert when error count of latest week is greater than average of all weeks in past 30 days?

allladin101
Explorer

Hi All,

I want to check if there is a way by which, I could set up an alert when the error count of the latest week is greater than the mean of all the weeks in the past 30 days. My current query is:

index=tms_uat* ERR  earliest=-30d@d latest=-0d@d tms_logcat="ERR-*" NOT ("SSL Error: Error on Read errno 104*")| timechart span=7d count by tms_logcat limit=40

Can someone please help me.

Tags (3)
0 Karma
1 Solution

strive
Influencer

With out timewrap app, if you need an answer

try this

<Some Search Terms...> earliest=-4w@w latest=-0w@w | bucket _time span=1w | stats count as TotalErrors by _time | eventstats mean(TotalErrors) as Mean | sort 1 -_time | eval alertCode = if(TotalErrors>Mean,1,0)

This is strictly based on last four weeks (time is snapped to week). It wont consider data for current week.

In your case if last week means 'last 7 days ignoring today' then change the earliest to -28d@d, latest as -0d@d, span as 7d.

As my earlier comment, it is not right to run searches on raw data if log volume is very high.

View solution in original post

strive
Influencer

With out timewrap app, if you need an answer

try this

<Some Search Terms...> earliest=-4w@w latest=-0w@w | bucket _time span=1w | stats count as TotalErrors by _time | eventstats mean(TotalErrors) as Mean | sort 1 -_time | eval alertCode = if(TotalErrors>Mean,1,0)

This is strictly based on last four weeks (time is snapped to week). It wont consider data for current week.

In your case if last week means 'last 7 days ignoring today' then change the earliest to -28d@d, latest as -0d@d, span as 7d.

As my earlier comment, it is not right to run searches on raw data if log volume is very high.

MuS
Legend

Hi alladin101,

this is another good use case for the timewrap app. Take this run everywhere command and adapt it to your needs:

index=_internal source=*metrics.log earliest=-30d@d 
| timechart span=1w count 
| timewrap w series=short 
| eval mean=(s1+s2+s3)/3 
| where s0 > mean

The timechart will count events for each week, timewrap will group each week into new fields called s0, s1 ...., the eval will calculate the mean of the last three weeks and the where will check if the lastet week event count is higher than the mean.

But remember, depending on the event count this can take some time to complete.

hope this helps to get you started ...

cheers, MuS

0 Karma

allladin101
Explorer

Volume is not even distributed, but we may says its mostly high.

Not using any summary index yet.

0 Karma

strive
Influencer

what is your log volume? If your log volume is high, then it is not right to execute the search on last weeks raw data.

Are you summarizing data and storing it in some summary index?

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...