Alerting

Splunk complicated statistical outliers from a changing info

nmayafit
Path Finder

Hi all,

I found some partial data regarding what I need but the solution is still not found and I call for help.
I'm sending SMS messages to customers in different countries and logging the undelivered and total amount count.
What I want to do is alert when the undelivered number is passing the boundary (which I need to calculate). Find the anomaly.
What I've done so far is calculate the percent of the undelivered from the total, averaged and created a standard deviation from the percent and created a boundary from (avg+stdev*2).
But, this one will be good for a normal deviation. The problem is that SMS is not the same all the time and the total decrease on nights and weekends and increase on week days, so I need to insert some variable into my calculation that will take that into account and make the boundary increase when the total is low (I don't want the alert to trigger when 5 out of 10 messages were not delivered, 50%) and decrease when the total is high (I do want to trigger when 100 out of 10000 messages were not delivered, 1%).
This is my query:

... | bin _time span=1h | stats  count(eval(status=="undelivered" OR status=="failed")) as undelivered count as all by _time | eval percent=undelivered/all*100 | bin _time span=1d | streamstats window=100 avg(percent) as avg stdev(percent) as stdev  by _time
 | eval upperBound=(avg+stdev*2) 
 | eval isOutlier=if(percent > upperBound AND undelivered >= 50, upperBound, 0) 
 | fields "_time", "percent", "upperBound", "avg", "stdev", "isOutlier", "all", "undelivered"

The query is for one country, but I will finish this one and create it for all countries.
Also you can see I inserted a minimum of 50 undelivered so it will never trigger below that.

I know I can do it with splunk ML toolkit but currently I can't install it on our machine.
Any help will do.

Adding some graphs for reference of one country:

This is the total delivered:
alt text

This is the percent of undelivered:
alt text

Thanks!!

(If you need any more graphs just say. It wouldn't let me upload more images in the question.)

0 Karma

mattymo
Splunk Employee
Splunk Employee

Hey!

Check out this approach, let me know if it works or how you need to tweak it for your data:

https://answers.splunk.com/answers/511894/how-to-use-the-timewrap-command-and-set-an-alert-f.html?ch...

- MattyMo
0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...