Solved: Re: How to set up alert when error count of latest...

allladin101 · ‎08-20-2014

Hi All,

I want to check if there is a way by which, I could set up an alert when the error count of the latest week is greater than the mean of all the weeks in the past 30 days. My current query is:

index=tms_uat* ERR  earliest=-30d@d latest=-0d@d tms_logcat="ERR-*" NOT ("SSL Error: Error on Read errno 104*")| timechart span=7d count by tms_logcat limit=40

Can someone please help me.

strive · ‎08-21-2014

With out timewrap app, if you need an answer

try this

<Some Search Terms...> earliest=-4w@w latest=-0w@w | bucket _time span=1w | stats count as TotalErrors by _time | eventstats mean(TotalErrors) as Mean | sort 1 -_time | eval alertCode = if(TotalErrors>Mean,1,0)

This is strictly based on last four weeks (time is snapped to week). It wont consider data for current week.

In your case if last week means 'last 7 days ignoring today' then change the earliest to -28d@d, latest as -0d@d, span as 7d.

As my earlier comment, it is not right to run searches on raw data if log volume is very high.

View solution in original post

strive · ‎08-21-2014

With out timewrap app, if you need an answer

try this

<Some Search Terms...> earliest=-4w@w latest=-0w@w | bucket _time span=1w | stats count as TotalErrors by _time | eventstats mean(TotalErrors) as Mean | sort 1 -_time | eval alertCode = if(TotalErrors>Mean,1,0)

This is strictly based on last four weeks (time is snapped to week). It wont consider data for current week.

In your case if last week means 'last 7 days ignoring today' then change the earliest to -28d@d, latest as -0d@d, span as 7d.

As my earlier comment, it is not right to run searches on raw data if log volume is very high.

MuS · ‎08-21-2014

Hi alladin101,

this is another good use case for the timewrap app. Take this run everywhere command and adapt it to your needs:

index=_internal source=*metrics.log earliest=-30d@d 
| timechart span=1w count 
| timewrap w series=short 
| eval mean=(s1+s2+s3)/3 
| where s0 > mean

The timechart will count events for each week, timewrap will group each week into new fields called s0, s1 ...., the eval will calculate the mean of the last three weeks and the where will check if the lastet week event count is higher than the mean.

But remember, depending on the event count this can take some time to complete.

hope this helps to get you started ...

cheers, MuS

allladin101 · ‎08-21-2014

Volume is not even distributed, but we may says its mostly high.

Not using any summary index yet.

strive · ‎08-21-2014

what is your log volume? If your log volume is high, then it is not right to execute the search on last weeks raw data.

Are you summarizing data and storing it in some summary index?

How to set up alert when error count of latest week is greater than average of all weeks in past 30 days?

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!