Is there a way to using conditions to find all the values (SUM and COUNT) above a certain value to be used as part of another calculation?
My logic is to find all the values greater than or equal to the 99th percentile value from a query (e.g. if I have a 1000 queries, I'd want the sum of the last 10 queries), take that value away from the sum of the total query count and divide that by the total count minus the count of queries that are greater than equal to the 99th percentile.
I can replicate my idea on Excel with the idea I had in mind:
But I'm not sure how to replicate this on Splunk - any ideas?
You are trying too hard to do all the work yourself.
On the other one, I posted one version of code to calculate the avg of the 99th percentile. Here's a more direct route - use an
eval in the stats to
null out the ones you dont want to aggregate.
your query that gets the data by queryName | fields ResponseTime queryName | eventstats perc99(ResponseTime) as p99Resp by queryName | stats count(eval(if(ResponseTime<-p99Resp,Responsetime,null()))) as P98Count, avg(eval(if(ResponseTime<-p99Resp,Responsetime,null()))) as p98ResponseAvg, stdev(eval(if(ResponseTime<-p99Resp,Responsetime,null()))) as p98ResponseStdev, min(eval(if(ResponseTime<-p99Resp,Responsetime,null()))) as p98ResponseMin, max(eval(if(ResponseTime<-p99Resp,Responsetime,null()))) as p98ResponseMax by queryName
So what does the eval side do, remove any queries that have a response time that exceed the p99 value for the queryName? I think I saw that there was an IF function available, but I didn't know you needed to use '<-' for less than or equal to
I presume this is related to your previous question. 🙂 (Which is not a problem, they do appear somewhat distinct - if the answer to one is the same or similar to the answer to the other we can figure out how to handle that later).
An example may help - please correct where this is wrong. Example will use 10 values and a 90%, and I'm going to make up most of the math because I'm lazy.
Ten events with some values: 1, 5, 6, 7, 8, 9, 10,11,12,18
Your 90% says that the values 1 and 18 are not inside the 90% <- made up.
What exactly do you want to do with this example, then?
Sum 1+18 = 19,
then take that 19 away from 5+6+7+...+12=68 == 49 ?
OR then take that 19 away from the WHOLE sum of 87-19 = 68 ? <-- this, I think?
Divide that (68) by the total count (10?) == 6.8,
oh no, 68/8 because you are missing two that got removed? == 8.5 <-- this I think?
That's 8.5. Since my data was more or less symmetrical about the average that's not much different from the average of the whole set (8.7) but certainly different data will give different results.
So is that what you are trying to do?
If not, please correct the example to do what you want. I think this will help a LOT of people help you find an answer.
Yes, they're related to each other - I had an idea pop into my head for the first one, but I thought I'd write the question differently.
The example you provided is almost what I'm looking for - I'll explain what I mean:
I'm looking to use all but the last value so, in this case, the 10th value would be removed but the search include the first value as well. My idea is that either, I could get the sum of the counts that are within the 99% range (or here, it would be all the values <= 12) and then dividing by 9 (since 10 - 1 = 9)
The results would be: Sum=69 and new Average=7.7 (rounded to 1 d.p)
Yes, ideally, once I get all these averages, I can do further analysis on the ones that exceed a certain value.