Alerting

How you detect an anomaly from a time frame the previous day or week during the same time of day?

mhanda
New Member

I'm trying to detect and alert on anomalies happening now from the previous weeks during the same time frame today. There are a couple of questions and solutions that have been answered that are similar to what I am looking for(like this and this) but I'm struggling to cobble it together.
I currently have a search that that will work if there is a drop in an average using sma:

 index=<env> <log source> 
 | timechart count(payload.<event>*) as selected* span=1h 
 | eval packages= <event-value 1> + <event-value 2> + <event-value 3> + <event-value 4> 
 | trendline sma2(packages) as moving_avg_packages 
 | eval drop=if((moving_avg_packages / 3.03) > packages, 0, 1)

I have the packages eval because it provides some insight into if an event triggers, you can see easily see a possible pattern and it provides a little sanity. The boolean makes it easy to trigger the alert. I have a cron run every 30m to allow enough time for the data to collect to prevent false alerts.

It would be great to compare packages from the same time period, possibly as an sma in something like 2h increments from the previous 4 weeks.
This looks interesting as a way to check for anomaly going both ways, but was a little beyond me to make work in the above working search in a way that was productive to my need:

| streamstats window=2 mean(count) as sma2 by packages | eval outlier = if(count > (3 * sma2), count - (3 * sma2), null()) 

Any help that could be provided would be really awesome. I have spent some hours trying to get it together. I would like to do it for the days before in the same fashion as the weeks but figured that if I could get weekly trends to average and alert, I could figure out days easy enough.

0 Karma

mhanda
New Member

Unfortunately, it didn't answer my question. I did try to use the information to my advantage but was unsuccessful. Perhaps your solution was beyond me but I also haven't had much time to continue to try to make it work.

0 Karma

skoelpin
SplunkTrust
SplunkTrust

You should add a bounty so the question gets answered rather than leaving it open.

0 Karma

skoelpin
SplunkTrust
SplunkTrust

I actually just did a very similar project just like this one.. It worked by using relative_time(now(),"") to look back in time and take a time slice on the same day and hour from the previous week. It would then have seven 10 minute spans. It would look back for a total of 5 weeks and average those 7 points together per time span and use a timeshift to push the "prediction" forward in time. I packaged this up in a macro and passed arguments into the macro to specify confidence interval, relative time, days to predict, and field to predict. I then created a summary index and pushed the metrics into it (Upper bounds, lower bounds, and prediction). I then created a second populating search to populate the observed value. This gives us the ability to trend the observed values onto the prediction chart and contneously validate how the prediction was doing. Lastly, I added business rule (alerts) which trend anytime the actual value goes below or above the limits.

To get a visual on how this works, add | timewrap 5w to your search and watch how theres 5 weeks of data sitting on top of each other. The search above will grab an hour's worth of data and have 10 minute timespans and calculate the average between the 5 points in each timespan. This will be your prediction

I would strongly recommend using a data model or summary index to do this as querying over raw data sets of 5 weeks back can result in long running searches.

If anyones interested, I can grab my time-shift search and post it up

Edit: Here's the search

index=xxxxx sourcetype=xxxxxx  
| bin _time span=10m
| lookup xxxxxxxxx.csv customer_id
| stats count by customer, _time


| eval w=case( 
(_time>relative_time(now(), "$reltime$@d-5w-30m") AND _time<=relative_time(now(), "$reltime$@d-5w+$days$d+30m")), 5,
(_time>relative_time(now(), "$reltime$@d-4w-30m") AND _time<=relative_time(now(), "$reltime$@d-4w+$days$d+30m")), 4,
(_time>relative_time(now(), "$reltime$@d-3w-30m") AND _time<=relative_time(now(), "$reltime$@d-3w+$days$d+30m")), 3,
(_time>relative_time(now(), "$reltime$@d-2w-30m") AND _time<=relative_time(now(), "$reltime$@d-2w+$days$d+30m")), 2,
(_time>relative_time(now(), "$reltime$@d-1w-30m") AND _time<=relative_time(now(), "$reltime$@d-1w+$days$d+30m")), 1)
| eval shift=case(isnotnull(w),"+"+w+"w-30m,+"+w+"w-20m,+"+w+"w-10m,+"+w+"w-0m,+"+w+"w+10m,+"+w+"w+20m,+"+w+"w+30m,")
| where isnotnull(shift) 
| makemv delim="," shift 
| mvexpand shift 
| eval time=relative_time(_time,shift)
| eventstats avg($val$) AS pred by time, customer
| eval upper=if($val$>pred,$val$,pred) 
| eval lower=if($val$<pred,$val$,pred) 
| eval lower=if(lower=0,"",lower) 
| eventstats avg($val$) AS pred, stdev($val$) as pred_stdev, by time, customer
| eval upper=if(upper>pred+1*pred_stdev,pred_stdev*0.5+pred,upper)
| eval lower=if(lower <pred-1*pred_stdev,pred_stdev*0.5+pred, lower)
| stats avg($val$) AS pred, stdev(upper) AS ustdev, stdev(lower) AS lstdev by time, customer 
| eval low=pred-lstdev*(sqrt(1/(1-$confidence$/100))) 
| eval low=if(low<0, 1, low) 
| eval high=pred+ustdev*(sqrt(1/(1-$confidence$/100))) 
| eval _time=time 
| timechart span=10m useother=false limit=0 cont=false  min(pred) as pred , min(low) as low, min(high) as high by customer
| makecontinuous _time
| where _time>relative_time(now(),"-4h”)

IMPORTANT - You must set your timerange to the last 5 weeks for this to work properly

There's 4 arguments passed into this macro which are $val$ = field to predict, $reltime$ = relative time to start prediction, $confidence = confidence interval of the boundaries, and $days$ = how many days into the future to predict. By default you can do 2 days into the future, if you want to do more, you have to change MAX_DAYS_HENCE in props.

The macro represents everything from the first eval down. The macro requires you to timechart into it with the field you are wanting to predict (Example above, I'm using a timechart to calculate a field called "count" so my prediction field must be "count"). Lastly, the line | eval time=relative_time(_time,shift) is responsible for shifting the prediction into the future.

mhanda
New Member

Sweet. Thanks for the update, skoelpin. I'll take a look. I'm sure that there is something to help me in my journey in this. I really appreciate it!

0 Karma

skoelpin
SplunkTrust
SplunkTrust

Did this answer your question? If so can you accept it and close it out?

0 Karma

somesoni2
SplunkTrust
SplunkTrust

+1 for example search.

0 Karma

mhanda
New Member

I would like to see your search as I am still banging my head on my desk over this.

0 Karma

skoelpin
SplunkTrust
SplunkTrust

I updated my post with the search. Don't bang your head, this is a very hard problem to solve and took a while to get it working properly.

0 Karma

mhanda
New Member

My links didn't come through, I just noticed, that I was referencing. "answered that are similar to what I am looking for(like this and this)" should have:

0 Karma

esix_splunk
Splunk Employee
Splunk Employee

Look at the timewrap command : https://docs.splunk.com/Documentation/SplunkCloud/6.6.3/SearchReference/Timewrap

You can add something like the following to your search...

........ | timewrap 1day

Might be of use..

0 Karma

mhanda
New Member

Thanks. That kind of gets me close in that it wraps up each column into day numbered values in hour rows. I'll play with that a bit.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...