I am tracking 500 errors on a daily basis. The average usually remains constant but sometimes it will increase more than 50%. If this happens I want to have Splunk send an alert
My current search
index=vertex7-access RTG_Error="500" earliest=-6d@d latest=@d | timechart count | timewrap d
So if the moving average deviates more then 50% over the average for the past 6 days, I want Splunk to alert me
Run this search every 6 hours for the last 24 hours:
index=vertex7-access RTG_Error="500" | stats count | where count > 1.5 * [ search index=vertex7-access RTG_Error="500" earliest=-6d@d latest=@d | bucket _time span=1d | stats count by _time | stats avg(count) as AvgDailyError500Count | return $AvgDailyError500Count ]
Hey skoelpin
Try this:
index=vertex7-access RTG_Error="500" earliest=-6d@d latest=-1@d | timechart span=1d count AS totals | stats avg(totals) AS last_week_avg | appendcols [search index=vertex7-access RTG_Error="500" earliest=-1d@d latest=now | timechart span=1d count AS today_avg] | eval alert = if((today_avg>last_week_avg*1.5),"true","false")
Then you just need to choose alert if custom condition is met: alert=true
There might be another solution without subsearches but this should work.
Run this search every 6 hours for the last 24 hours:
index=vertex7-access RTG_Error="500" | stats count | where count > 1.5 * [ search index=vertex7-access RTG_Error="500" earliest=-6d@d latest=@d | bucket _time span=1d | stats count by _time | stats avg(count) as AvgDailyError500Count | return $AvgDailyError500Count ]
It's only returning the count over the past 6 days, how should I test this?
Do you recommend I change this ..| where count > 0.5
so an alert triggers at 50% of the current average?
Would it be possible to create a barchart showing the count for each day (over the past 6 days) and have a trendline showing the 6 day average. Then I can have the alert go off if it goes 50% higher?
That is a reasonable way to test it.
You can take out the guts and chart it like this:
index=vertex7-access RTG_Error="500" earliest=-6d@d latest=@d | bucket _time span=1d | timechart count
If you need a more compact view, use a sparkline like this:
index=vertex7-access RTG_Error="500" earliest=-6d@d latest=@d | stats sparkline(count, 1d)
Have you tried looking at the predict
command?
http://info.prelert.com/blog/anomaly-detective-vs-splunks-anomalies-command-what-is-the-difference
I tested this and it worked perfectly. Thanks!
For moving averages, If you the search today (Aug-11), you want an alert if the avg from Aug 5-Aug 10, is 50% higher than avg from Aug 4-Aug9 ?
Yes very close. I want an alert set on a cron schedule to run every 6 hours (It was originally 1 day but now I want it to be 6 hours). So the alert will have an average from [Aug 5th - Aug 10th] and if at anytime that number goes 50% above the ~6 day average then an alert is sent out
To add onto this with an example
From [Aug 5th - Aug 10th] the average number of errors for the day was 90,000
Say on August 11th the average is 140,000, we are now 50% more than what our average was from [Aug 5th - Aug 10th] and an alert is sent