I'm trying to detect and alert on anomalies happening now from the previous weeks during the same time frame today. There are a couple of questions and solutions that have been answered that are similar to what I am looking for(like this and this) but I'm struggling to cobble it together.
I currently have a search that that will work if there is a drop in an average using sma:
index=<env> <log source> | timechart count(payload.<event>*) as selected* span=1h | eval packages= <event-value 1> + <event-value 2> + <event-value 3> + <event-value 4> | trendline sma2(packages) as moving_avg_packages | eval drop=if((moving_avg_packages / 3.03) > packages, 0, 1)
I have the packages eval because it provides some insight into if an event triggers, you can see easily see a possible pattern and it provides a little sanity. The boolean makes it easy to trigger the alert. I have a cron run every 30m to allow enough time for the data to collect to prevent false alerts.
It would be great to compare packages from the same time period, possibly as an sma in something like 2h increments from the previous 4 weeks.
This looks interesting as a way to check for anomaly going both ways, but was a little beyond me to make work in the above working search in a way that was productive to my need:
| streamstats window=2 mean(count) as sma2 by packages | eval outlier = if(count > (3 * sma2), count - (3 * sma2), null())
Any help that could be provided would be really awesome. I have spent some hours trying to get it together. I would like to do it for the days before in the same fashion as the weeks but figured that if I could get weekly trends to average and alert, I could figure out days easy enough.
Look at the timewrap command : https://docs.splunk.com/Documentation/SplunkCloud/6.6.3/SearchReference/Timewrap
You can add something like the following to your search...
........ | timewrap 1day
Might be of use..
Thanks. That kind of gets me close in that it wraps up each column into day numbered values in hour rows. I'll play with that a bit.
My links didn't come through, I just noticed, that I was referencing. "answered that are similar to what I am looking for(like this and this)" should have:
I actually just did a very similar project just like this one.. It worked by using
relative_time(now(),"") to look back in time and take a time slice on the same day and hour from the previous week. It would then have seven 10 minute spans. It would look back for a total of 5 weeks and average those 7 points together per time span and use a timeshift to push the "prediction" forward in time. I packaged this up in a macro and passed arguments into the macro to specify confidence interval, relative time, days to predict, and field to predict. I then created a summary index and pushed the metrics into it (Upper bounds, lower bounds, and prediction). I then created a second populating search to populate the observed value. This gives us the ability to trend the observed values onto the prediction chart and contneously validate how the prediction was doing. Lastly, I added business rule (alerts) which trend anytime the actual value goes below or above the limits.
To get a visual on how this works, add
| timewrap 5w to your search and watch how theres 5 weeks of data sitting on top of each other. The search above will grab an hour's worth of data and have 10 minute timespans and calculate the average between the 5 points in each timespan. This will be your prediction
I would strongly recommend using a data model or summary index to do this as querying over raw data sets of 5 weeks back can result in long running searches.
If anyones interested, I can grab my time-shift search and post it up
Edit: Here's the search
index=xxxxx sourcetype=xxxxxx | bin _time span=10m | lookup xxxxxxxxx.csv customer_id | stats count by customer, _time | eval w=case( (_time>relative_time(now(), "$reltime$@d-5w-30m") AND _time<=relative_time(now(), "$reltime$@d-5w+$days$d+30m")), 5, (_time>relative_time(now(), "$reltime$@d-4w-30m") AND _time<=relative_time(now(), "$reltime$@d-4w+$days$d+30m")), 4, (_time>relative_time(now(), "$reltime$@d-3w-30m") AND _time<=relative_time(now(), "$reltime$@d-3w+$days$d+30m")), 3, (_time>relative_time(now(), "$reltime$@d-2w-30m") AND _time<=relative_time(now(), "$reltime$@d-2w+$days$d+30m")), 2, (_time>relative_time(now(), "$reltime$@d-1w-30m") AND _time<=relative_time(now(), "$reltime$@d-1w+$days$d+30m")), 1) | eval shift=case(isnotnull(w),"+"+w+"w-30m,+"+w+"w-20m,+"+w+"w-10m,+"+w+"w-0m,+"+w+"w+10m,+"+w+"w+20m,+"+w+"w+30m,") | where isnotnull(shift) | makemv delim="," shift | mvexpand shift | eval time=relative_time(_time,shift) | eventstats avg($val$) AS pred by time, customer | eval upper=if($val$>pred,$val$,pred) | eval lower=if($val$<pred,$val$,pred) | eval lower=if(lower=0,"",lower) | eventstats avg($val$) AS pred, stdev($val$) as pred_stdev, by time, customer | eval upper=if(upper>pred+1*pred_stdev,pred_stdev*0.5+pred,upper) | eval lower=if(lower <pred-1*pred_stdev,pred_stdev*0.5+pred, lower) | stats avg($val$) AS pred, stdev(upper) AS ustdev, stdev(lower) AS lstdev by time, customer | eval low=pred-lstdev*(sqrt(1/(1-$confidence$/100))) | eval low=if(low<0, 1, low) | eval high=pred+ustdev*(sqrt(1/(1-$confidence$/100))) | eval _time=time | timechart span=10m useother=false limit=0 cont=false min(pred) as pred , min(low) as low, min(high) as high by customer | makecontinuous _time | where _time>relative_time(now(),"-4h”)
IMPORTANT - You must set your timerange to the last 5 weeks for this to work properly
There's 4 arguments passed into this macro which are $val$ = field to predict, $reltime$ = relative time to start prediction, $confidence = confidence interval of the boundaries, and $days$ = how many days into the future to predict. By default you can do 2 days into the future, if you want to do more, you have to change
MAX_DAYS_HENCE in props.
The macro represents everything from the first
eval down. The macro requires you to timechart into it with the field you are wanting to predict (Example above, I'm using a timechart to calculate a field called "count" so my prediction field must be "count"). Lastly, the line
| eval time=relative_time(_time,shift) is responsible for shifting the prediction into the future.
I updated my post with the search. Don't bang your head, this is a very hard problem to solve and took a while to get it working properly.
Sweet. Thanks for the update, skoelpin. I'll take a look. I'm sure that there is something to help me in my journey in this. I really appreciate it!