Well i have a data and i want to get alerted when we hav spike in 5xx errors corresponding to endpoints. All endpoints have different trend of 5xx errors in general. And traffic also is variable depending on day and night. So if traffic is more than we will probably see more 5xx compared to the one when traffic is low.
I tried to use interquartile range method to check for outliers but I doubt the usage of that as when traffic is going to increase them it will alert without any reason.
Is there any other apt way to do that.
Index= rxc sourcetype=rxcapp status=5* endpoint=*| stats count as error by status
This would be my base query
I tried using below but then traffic is variable so in morning time there will be a little more errors than at night in such case this alert is always going to trigger and create spam
Is there ny better way to work with dynamic thresholds like may be calculating percentage change but then how to decide threshold in that case
Index= rxc sourcetype=rxcapp status=5* endpoint=* earliest=-20m@m latest=now| bucket _time span=2m|stats count as error by_time status endpoint| streamstats median (error) as med p75(error) as p75 p25(error) as p245 by status endpoint| eval iqr=(p75-p25)| eval lower=(med-iqr*1.5) | eval upper=(med+iqr*1.5)| where error>upper| fields _time endpoint error status upper lower med iqr
There are lots of ways to do this, and you probably are not going to want to do it based on a rolling window.
Typically, what we do in this situation is to create a summary index and keep track of the typical rates for any given time of day (and any other characteristics you want to consider, such as weekends or holidays.)
So, let's suppose we use a metric index for this, and we have three dimensions - the endpoint name, a weekday/weekend flag, and the hour of the day.
Every night, you load the summary data for the prior day into the metric index, and right after that you can calculate the p90 for each endpoint for the last 30-90 days for each hour and each weekday/weekend flag, then putting the result into a lookup.
Thus, you can just read the lookup to find out what your daily threshold is.
If the way I was trying to do using Interquartile way, is there a way or what modification in query is needed if I want to run separately for business hours and non -business hours and what do you think is this gonna solve my problem. Like the way I am trying to do , is that the right approach or total wrong approach