I am attempting to calculate the following:
- Total Number "Requests Per Day"
- Average/Mean "Requests Per Day"
- Standard Deviation "Requests Per Day"
I am using the following search:
index=myCoolIndex cluster_name="myCoolCluster" sourcetype=myCoolSourceType label_app=myCoolApp ("\"statusCode\"") | rex .*\"traceId\"\s:\s\"?(?<traceId>.*?)\".* | dedup traceId | rex "(?s)\"statusCode\"\s:\s\"?(?<statusCode>[245]\d{2})\"?" | timechart span=1d count(statusCode) as "Number_Of_Requests" | where Number_Of_Requests > 0 | eventstats mean(Number_Of_Requests) as "Average Requests Per Day" stdev(Number_Of_Requests) as "Standard Deviation"
I am getting results back, but am unsure if the results I am getting back are correct per what I am trying to look for. For instance, I would have thought "stdev()" would need some eval statement to know what the "Total Requests Per Day" and "Average/Mean Requests Per Day" is? Does the "where Number_Of_Requests > 0" skew the results since those are not getting added to the result set? Was hoping someone would be able to take a look at my query and provide a little insight as to what I may still need to do so I can get an accurate Standard Deviation. Also, below is the output I am getting from the current query:
Number_Of_Requests Average Requests Per Day Standard Deviation
25687 64395 54741.378572337766
103103 64395 54741.378572337766
Any help is appreciated!
Yes, you will get the mean and standard deviation of all the daily counts in your time period.
By using "| where Number_Of_Requests > 0", you are potentially "skewing" the results, although that does depend on what it is you are trying to show. For example, if you had 5 days, with counts of 2, 0, 0, 0, 3, your mean would be 1 with the zeroes included, and 2.5 without the zeroes. Similarly, the stddev would be similarly affected by the removal or inclusion of the zeroes.
If I hear what you are saying correctly, then it is likely going to be a more accurate representation of mean and standard deviation if I include the "0" that way every day gets included on the calculation and not only the days in which there are data points?
Correct - it is usually more meaningful to include the zeroes, but it does depend on what you are trying to show.
Makes sense. Does the formula itself look legit? Meaning assuming the search criteria is correct and I should get the correct standard deviation based on Requests Per Day?
Yes, you will get the mean and standard deviation of all the daily counts in your time period.