Hi Splunkers,
This may be easy, but I'm not able to solve it, if anyone can help.
I want to set a lower threshold to 15 standard deviation below the mean, and the upper threshold to 15 standard deviation above the mean, but I'm not sure how to implement that.
Thanks!
So this is what I have:
index=X sourcetype=Y source=metrics.kv_log appln_name IN ("FEED_FILE_ROUTE", "FEED_INGEST_ROUTE") this_hour="*"
| bin span=1h _time
| stats latest(this-hour) AS Volume BY appln_name, _time
| eval day_of_week=strftime(_time,"%A"), hour=strftime(_time,"%H")
| lookup mt_expected_processed_volume.csv name as appln_name, day_of_week, hour outputnew avg_volume, stdev_volume
The issue you may have with out of bounds data is that your mean average and standard deviation are imported from your csv file and may or may not relate to the actual data you are evaluating. You could try recalculating your mean and standard deviation to get a better understanding of what an outlier might be. Having said that, you might want to consider using the median (although you might want to bucket the volumes first) and consider the variance from the median instead, to identify outliers.
Try something like this
| where Volume > avg_volume - (stdev_volume * 15) AND Volume < avg_volume + (stdev_volume * 15)
Thanks ITWhisperer,
I don't actually want to filter right away, what I want to do is that I want to add a column for upper threshold and lower threshold and then a third column for if the Volume is within the threshold or not.
And thanks again for always helping !
| eval lower_bound = avg_volume - (stdev_volume * 15)
| eval upper_bound = avg_volume + (stdev_volume * 15)
| eval in_bounds = if (Volume > lower_bound AND Volume < upper_bound, "true", "false")
Thanks again, IT Whisperer !
This works perfect, except I want to grab a sum of in_bound= true VS in_bound=false for each app.
| eval in_bounds = if (Volume > lower_bound AND Volume < upper_bound, 1, 0)
| eval out_bounds = if (Volume > lower_bound AND Volume < upper_bound, 0, 1)
So I would have triggered the alert many times compared to the number of hours where it wouldn't have triggered.
How to play around with the standard deviation multiplication factor with (currently 15) to see if I can reduce the number of outbounds to inbounds.
FYI: app sum(in_bounds) SUM(OUT_BOUNDS)
Feed_file 12 701
Feed_ingest 199 501
The issue you may have with out of bounds data is that your mean average and standard deviation are imported from your csv file and may or may not relate to the actual data you are evaluating. You could try recalculating your mean and standard deviation to get a better understanding of what an outlier might be. Having said that, you might want to consider using the median (although you might want to bucket the volumes first) and consider the variance from the median instead, to identify outliers.
Standard deviation from mean works well with normal distributions - the fact that you have so many outside 15 standard deviations would seem to suggest that you do not have a normal distribution. Do you know what the distribution looks like?
Thanks for the follow-up. What is the distribution looks like, please?
Thanks
Depending on the values of your Volume field, you could bucket them into 100s (for example) and the count the frequencies of the various volume ranges
| bin Volume span=100s
| stats count by Volume