Splunk Search

Standard Deviation Total Requests Per Day

dickersons
Explorer

I am attempting to calculate the following:

-  Total Number "Requests Per Day"

-  Average/Mean "Requests Per Day"

-  Standard Deviation "Requests Per Day"

I am using the following search:

index=myCoolIndex cluster_name="myCoolCluster" sourcetype=myCoolSourceType label_app=myCoolApp ("\"statusCode\"") | rex .*\"traceId\"\s:\s\"?(?<traceId>.*?)\".* | dedup traceId | rex "(?s)\"statusCode\"\s:\s\"?(?<statusCode>[245]\d{2})\"?" | timechart span=1d count(statusCode) as "Number_Of_Requests" | where Number_Of_Requests > 0 | eventstats mean(Number_Of_Requests) as "Average Requests Per Day" stdev(Number_Of_Requests) as "Standard Deviation"

I am getting results back, but am unsure if the results I am getting back are correct per what I am trying to look for.  For instance, I would have thought "stdev()" would need some eval statement to know what the "Total Requests Per Day" and "Average/Mean Requests Per Day" is?   Does the "where Number_Of_Requests > 0" skew the results since those are not getting added to the result set?  Was hoping someone would be able to take a look at my query and provide a little insight as to what I may still need to do so I can get an accurate Standard Deviation.  Also, below is the output I am getting from the current query:

Number_Of_Requests	 Average Requests Per Day   Standard Deviation
	25687	                 64395	                    54741.378572337766
	103103	                 64395	                    54741.378572337766

 

Any help is appreciated!

 

Labels (3)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust

Yes, you will get the mean and standard deviation of all the daily counts in your time period.

View solution in original post

ITWhisperer
SplunkTrust
SplunkTrust

By using "| where Number_Of_Requests > 0", you are potentially "skewing" the results, although that does depend on what it is you are trying to show. For example, if you had 5 days, with counts of 2, 0, 0, 0, 3, your mean would be 1 with the zeroes included, and 2.5 without the zeroes. Similarly, the stddev would be similarly affected by the removal or inclusion of the zeroes.

0 Karma

dickersons
Explorer

If I hear what you are saying correctly, then it is likely going to be a more accurate representation of mean and standard deviation if I include the "0" that way every day gets included on the calculation and not only the days in which there are data points?

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Correct - it is usually more meaningful to include the zeroes, but it does depend on what you are trying to show.

0 Karma

dickersons
Explorer

Makes sense.  Does the formula itself look legit?  Meaning assuming the search criteria is correct and I should get the correct standard deviation based on Requests Per Day?

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Yes, you will get the mean and standard deviation of all the daily counts in your time period.

Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...