Splunk Enterprise

How to lower threshold to 15 standard deviation below the mean, and upper to 15 above the mean?

majilan1
Path Finder

Hi Splunkers,

This may be easy, but I'm not able to solve it, if anyone can help.

I want to set a lower threshold to 15 standard deviation below the mean, and the upper threshold to 15 standard deviation above the mean, but I'm not sure how to implement that. 

Thanks!

 So this is what I have: 

 index=X  sourcetype=Y  source=metrics.kv_log  appln_name IN ("FEED_FILE_ROUTE", "FEED_INGEST_ROUTE")  this_hour="*"

| bin span=1h  _time

| stats latest(this-hour) AS Volume BY appln_name, _time

| eval day_of_week=strftime(_time,"%A"), hour=strftime(_time,"%H")

| lookup mt_expected_processed_volume.csv name as appln_name, day_of_week, hour outputnew avg_volume, stdev_volume
Labels (1)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust

The issue you may have with out of bounds data is that your mean average and standard deviation are imported from your csv file and may or may not relate to the actual data you are evaluating. You could try recalculating your mean and standard deviation to get a better understanding of what an outlier might be. Having said that, you might want to consider using the median (although you might want to bucket the volumes first) and consider the variance from the median instead, to identify outliers.

View solution in original post

ITWhisperer
SplunkTrust
SplunkTrust

Try something like this

| where Volume > avg_volume - (stdev_volume * 15) AND Volume < avg_volume + (stdev_volume * 15)
0 Karma

majilan1
Path Finder

Thanks ITWhisperer,

I don't actually want to filter right away, what I want to do is that I want to add a column for upper threshold and lower threshold and then a third column for if the Volume is within the threshold or not. 

And thanks again for always helping !

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust
| eval lower_bound = avg_volume - (stdev_volume * 15)
| eval upper_bound = avg_volume + (stdev_volume * 15)
| eval in_bounds = if (Volume > lower_bound AND Volume < upper_bound, "true", "false")
0 Karma

majilan1
Path Finder

Thanks again, IT Whisperer !

This works perfect, except I want to grab a sum of in_bound= true VS in_bound=false for each app.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust
| eval in_bounds = if (Volume > lower_bound AND Volume < upper_bound, 1, 0)
| eval out_bounds = if (Volume > lower_bound AND Volume < upper_bound, 0, 1)

majilan1
Path Finder

So I would have triggered the alert many times compared to the number of hours where it wouldn't have triggered.

How to play around with the standard deviation multiplication factor with (currently 15) to see if I can reduce the number of outbounds to inbounds.

FYI:   app                                                     sum(in_bounds)                                                             SUM(OUT_BOUNDS)

            Feed_file                                            12                                                                                            701

             Feed_ingest                                     199                                                                                              501                                                                         

                                                                                                                                                                      

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

The issue you may have with out of bounds data is that your mean average and standard deviation are imported from your csv file and may or may not relate to the actual data you are evaluating. You could try recalculating your mean and standard deviation to get a better understanding of what an outlier might be. Having said that, you might want to consider using the median (although you might want to bucket the volumes first) and consider the variance from the median instead, to identify outliers.

ITWhisperer
SplunkTrust
SplunkTrust

Standard deviation from mean works well with normal distributions - the fact that you have so many outside 15 standard deviations would seem to suggest that you do not have a normal distribution. Do you know what the distribution looks like?

0 Karma

majilan1
Path Finder

Thanks for the follow-up. What is the distribution looks like, please?

Thanks

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Depending on the values of your Volume field, you could bucket them into 100s (for example) and the count the frequencies of the various volume ranges

| bin Volume span=100s
| stats count by Volume
0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...