I found some partial data regarding what I need but the solution is still not found and I call for help.
I'm sending SMS messages to customers in different countries and logging the undelivered and total amount count.
What I want to do is alert when the undelivered number is passing the boundary (which I need to calculate). Find the anomaly.
What I've done so far is calculate the percent of the undelivered from the total, averaged and created a standard deviation from the percent and created a boundary from (avg+stdev*2).
But, this one will be good for a normal deviation. The problem is that SMS is not the same all the time and the total decrease on nights and weekends and increase on week days, so I need to insert some variable into my calculation that will take that into account and make the boundary increase when the total is low (I don't want the alert to trigger when 5 out of 10 messages were not delivered, 50%) and decrease when the total is high (I do want to trigger when 100 out of 10000 messages were not delivered, 1%).
This is my query:
... | bin _time span=1h | stats count(eval(status=="undelivered" OR status=="failed")) as undelivered count as all by _time | eval percent=undelivered/all*100 | bin _time span=1d | streamstats window=100 avg(percent) as avg stdev(percent) as stdev by _time
| eval upperBound=(avg+stdev*2)
| eval isOutlier=if(percent > upperBound AND undelivered >= 50, upperBound, 0)
| fields "_time", "percent", "upperBound", "avg", "stdev", "isOutlier", "all", "undelivered"
The query is for one country, but I will finish this one and create it for all countries.
Also you can see I inserted a minimum of 50 undelivered so it will never trigger below that.
I know I can do it with splunk ML toolkit but currently I can't install it on our machine.
Any help will do.
Adding some graphs for reference of one country:
This is the total delivered:
This is the percent of undelivered:
(If you need any more graphs just say. It wouldn't let me upload more images in the question.)