Solved: How to lower threshold to 15 standard deviation be...

majilan1 · ‎07-06-2022

Hi Splunkers,

This may be easy, but I'm not able to solve it, if anyone can help.

I want to set a lower threshold to 15 standard deviation below the mean, and the upper threshold to 15 standard deviation above the mean, but I'm not sure how to implement that.

Thanks!

So this is what I have:

 index=X  sourcetype=Y  source=metrics.kv_log  appln_name IN ("FEED_FILE_ROUTE", "FEED_INGEST_ROUTE")  this_hour="*"

| bin span=1h  _time

| stats latest(this-hour) AS Volume BY appln_name, _time

| eval day_of_week=strftime(_time,"%A"), hour=strftime(_time,"%H")

| lookup mt_expected_processed_volume.csv name as appln_name, day_of_week, hour outputnew avg_volume, stdev_volume

ITWhisperer · ‎07-10-2022

The issue you may have with out of bounds data is that your mean average and standard deviation are imported from your csv file and may or may not relate to the actual data you are evaluating. You could try recalculating your mean and standard deviation to get a better understanding of what an outlier might be. Having said that, you might want to consider using the median (although you might want to bucket the volumes first) and consider the variance from the median instead, to identify outliers.

View solution in original post

ITWhisperer · ‎07-06-2022

Try something like this

| where Volume > avg_volume - (stdev_volume * 15) AND Volume < avg_volume + (stdev_volume * 15)

majilan1 · ‎07-07-2022

Thanks ITWhisperer,

I don't actually want to filter right away, what I want to do is that I want to add a column for upper threshold and lower threshold and then a third column for if the Volume is within the threshold or not.

And thanks again for always helping !

ITWhisperer · ‎07-07-2022

| eval lower_bound = avg_volume - (stdev_volume * 15)
| eval upper_bound = avg_volume + (stdev_volume * 15)
| eval in_bounds = if (Volume > lower_bound AND Volume < upper_bound, "true", "false")

majilan1 · ‎07-07-2022

Thanks again, IT Whisperer !

This works perfect, except I want to grab a sum of in_bound= true VS in_bound=false for each app.

ITWhisperer · ‎07-07-2022

| eval in_bounds = if (Volume > lower_bound AND Volume < upper_bound, 1, 0)
| eval out_bounds = if (Volume > lower_bound AND Volume < upper_bound, 0, 1)

majilan1 · ‎07-08-2022

So I would have triggered the alert many times compared to the number of hours where it wouldn't have triggered.

How to play around with the standard deviation multiplication factor with (currently 15) to see if I can reduce the number of outbounds to inbounds.

FYI: app sum(in_bounds) SUM(OUT_BOUNDS)

Feed_file 12 701

Feed_ingest 199 501

ITWhisperer · ‎07-10-2022

The issue you may have with out of bounds data is that your mean average and standard deviation are imported from your csv file and may or may not relate to the actual data you are evaluating. You could try recalculating your mean and standard deviation to get a better understanding of what an outlier might be. Having said that, you might want to consider using the median (although you might want to bucket the volumes first) and consider the variance from the median instead, to identify outliers.

ITWhisperer · ‎07-08-2022

Standard deviation from mean works well with normal distributions - the fact that you have so many outside 15 standard deviations would seem to suggest that you do not have a normal distribution. Do you know what the distribution looks like?

majilan1 · ‎07-08-2022

Thanks for the follow-up. What is the distribution looks like, please?

Thanks

ITWhisperer · ‎07-08-2022

Depending on the values of your Volume field, you could bucket them into 100s (for example) and the count the frequencies of the various volume ranges

| bin Volume span=100s
| stats count by Volume

How to lower threshold to 15 standard deviation below the mean, and upper to 15 above the mean?

using Splunk Enterprise

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!