topic Re: Find the average of the first 99% range of data in Splunk Search
https://community.splunk.com/t5/Splunk-Search/Find-the-average-of-the-first-99-range-of-data/m-p/301248#M90710
<P>Have you looked through "<A href="http://docs.splunk.com/Documentation/SplunkCloud/6.6.1/Search/Findingandremovingoutliers">finding and removing outliers</A>"? That also references the <A href="http://docs.splunk.com/Documentation/SplunkCloud/6.6.1/SearchReference/Outlier">outlier command</A>, which may be of use too.</P>Fri, 25 Aug 2017 13:06:31 GMTRichfez2017-08-25T13:06:31ZFind the average of the first 99% range of data
https://community.splunk.com/t5/Splunk-Search/Find-the-average-of-the-first-99-range-of-data/m-p/301247#M90709
<P>My search currently gives me some statistics regarding response times including total count, average, min, max and 99% percentile value (I'm assuming that this is ordered) based on one of the fields from the query.</P>
<P>I need to find the average query count for the first 99% of my data count per query name, so my logic was to take the minimum value from the 99% percentile value, and dividing that by 99% of the total count. But when I run this, I often get values for this not in the range between minimum and maximum for certain values, <STRONG>e.g. Average: 15533.5, Min: 9076, Max: 24737, *99% Average: 479.0</STRONG>*, which clearly wouldn't make sense even if the numbers are heavily skewed.</P>
<P>This was my initial search attempt:</P>
<P>index=* ... <BR />
| stats count AS "Query Count", avg(ResponseTime) AS "AvgRespTime", min(ResponseTime) AS "MinRespTime", max(ResponseTime) AS "MaxRespTime", p99(ResponseTime) AS "99PercRespTime" by <EM>queryName<BR />
| eval AvgRespTime=round(AvgRespTime,1), 99PercAvgRespTime=('99PercRespTime'-'MinRespTime')/(0.99</EM>'Query Count'), 99PercAvgRespTime=round('99PercAvgRespTime',1)<BR />
| sort - 99PercAvgRespTime</P>
<P>*EDIT: I've seen a flaw in my initial idea where I'm taking single values rather than sums, which are how you get averages (oops) - what I'd ideally want is to take the sum of the 99th percentile (and above) value(s) away from the total sum and divide that by the count of queries minus the count of queries above (and including) the 99th percentile. *</P>
<P>(1) Is it even possible to accurately get the average of the first 99% of the data? My logic was that by removing the last 1% of queries (so if it was 1000, the 10 queries with the highest response times), I can get this average but what should the query be? I considered taking 99% average, but I assume that would remove 0.5% from the lower end as well as the upper end, which is what I've been asked to avoid.</P>
<P>(2) Is there a way to show skewness of data based off statistics on Splunk? If possible, I'd want to show a graph or at least data showing this if possible.</P>Fri, 25 Aug 2017 11:15:55 GMThttps://community.splunk.com/t5/Splunk-Search/Find-the-average-of-the-first-99-range-of-data/m-p/301247#M90709sepkarimpour2017-08-25T11:15:55ZRe: Find the average of the first 99% range of data
https://community.splunk.com/t5/Splunk-Search/Find-the-average-of-the-first-99-range-of-data/m-p/301248#M90710
<P>Have you looked through "<A href="http://docs.splunk.com/Documentation/SplunkCloud/6.6.1/Search/Findingandremovingoutliers">finding and removing outliers</A>"? That also references the <A href="http://docs.splunk.com/Documentation/SplunkCloud/6.6.1/SearchReference/Outlier">outlier command</A>, which may be of use too.</P>Fri, 25 Aug 2017 13:06:31 GMThttps://community.splunk.com/t5/Splunk-Search/Find-the-average-of-the-first-99-range-of-data/m-p/301248#M90710Richfez2017-08-25T13:06:31ZRe: Find the average of the first 99% range of data
https://community.splunk.com/t5/Splunk-Search/Find-the-average-of-the-first-99-range-of-data/m-p/301249#M90711
<P>I'm currently checking the 'outlier' command - I'm not sure how you can set the upper bound to 99% though. My concern is when running the query now, I can see multiple query names, which all have different counts so I'm not sure how to set a limit. I'll try using this and see if I can get an answer.</P>Fri, 25 Aug 2017 13:33:35 GMThttps://community.splunk.com/t5/Splunk-Search/Find-the-average-of-the-first-99-range-of-data/m-p/301249#M90711sepkarimpour2017-08-25T13:33:35ZRe: Find the average of the first 99% range of data
https://community.splunk.com/t5/Splunk-Search/Find-the-average-of-the-first-99-range-of-data/m-p/301250#M90712
<P>Does if have to be "the last" portion? If not, check out the (newish) <CODE>Event Sampling</CODE> feature that is a blue link on the left side towards the middle under the histogram/timeline and above the <CODE>Event/Patterns/Statistics/Visualization</CODE> tabs.</P>Fri, 25 Aug 2017 13:58:40 GMThttps://community.splunk.com/t5/Splunk-Search/Find-the-average-of-the-first-99-range-of-data/m-p/301250#M90712woodcock2017-08-25T13:58:40ZRe: Find the average of the first 99% range of data
https://community.splunk.com/t5/Splunk-Search/Find-the-average-of-the-first-99-range-of-data/m-p/301251#M90713
<P>That's what I've been asked to provide - using 'Event Sampling' will show a random set of results, and given some of the query names have single digit results and some have thousands so that wouldn't work for this.</P>Fri, 25 Aug 2017 14:11:08 GMThttps://community.splunk.com/t5/Splunk-Search/Find-the-average-of-the-first-99-range-of-data/m-p/301251#M90713sepkarimpour2017-08-25T14:11:08ZRe: Find the average of the first 99% range of data
https://community.splunk.com/t5/Splunk-Search/Find-the-average-of-the-first-99-range-of-data/m-p/301252#M90714
<P>@sepkarimpour - what do you mean by "first 99%"? do you mean the data that is below the 99th percentile?</P>Fri, 25 Aug 2017 17:36:45 GMThttps://community.splunk.com/t5/Splunk-Search/Find-the-average-of-the-first-99-range-of-data/m-p/301252#M90714DalJeanis2017-08-25T17:36:45ZRe: Find the average of the first 99% range of data
https://community.splunk.com/t5/Splunk-Search/Find-the-average-of-the-first-99-range-of-data/m-p/301253#M90715
<P>It's not very difficult, just use eventstats to get the 99the percentile and then act accordingly</P>
<PRE><CODE>your query that gets the data by queryName
| fields ResponseTime queryName
| eventstats perc99(ResponseTime) as p99Resp by queryName
| where ResponseTime<=p99Resp
| rename COMMENT as "the above gives you only those records which are below the 99th percentile"
| rename COMMENT as "you can now analyze them by themselves"
| stats count as P98Count, avg(ResponseTime) as p98ResponseAvg, stdev(ResponseTime) as p98ResponseStdev,
min(ResponseTime) as p98ResponseMin, max(ResponseTime) as p98ResponseMax
by queryName
</CODE></PRE>
<P>...where p98 is the population below the 99th percentile.</P>
<P>...or... </P>
<PRE><CODE>your query that gets the data by queryName
| fields ResponseTime queryName
| eventstats perc99(ResponseTime) as p99Resp by queryName
| eval p00Unit=1
| eval p98Unit=if(ResponseTime<=p99Resp,1,0)
| eval p99Unit=if(ResponseTime<=p99Resp,0,1)
| eval p00Response=ResponseTime
| eval p98Response=if(ResponseTime<=p99Resp,ResponseTime,null())
| eval p99Response=if(ResponseTime<=p99Resp,null(),ResponseTime)
| stats sum(p00Unit) as P00Count, avg(p00Response) as p00ResponseAvg, stdev(p00Response) as p00ResponseStdev,
min(p00Response) as p00ResponseMin, max(p00Response) as p00ResponseMax,
sum(p98Unit) as P98Count, avg(p98Response) as p98ResponseAvg, stdev(p98Response) as p98ResponseStdev,
min(p98Response) as p98ResponseMin, max(p98Response) as p98ResponseMax,
sum(p99Unit) as P99Count, avg(p99Response) as p99ResponseAvg, stdev(p99Response) as p99ResponseStdev,
min(p99Response) as p99ResponseMin, max(p99Response) as p99ResponseMax
by queryName
</CODE></PRE>
<P>...where p00 is the entire population, p98 is the population below the 99th percentile, and p99 is the population above the 99th percentile.</P>Fri, 25 Aug 2017 18:08:23 GMThttps://community.splunk.com/t5/Splunk-Search/Find-the-average-of-the-first-99-range-of-data/m-p/301253#M90715DalJeanis2017-08-25T18:08:23ZRe: Find the average of the first 99% range of data
https://community.splunk.com/t5/Splunk-Search/Find-the-average-of-the-first-99-range-of-data/m-p/301254#M90716
<P>I tweaked the first query so it'd work with the initial search - I was running a comparison of the query counts and I can see whilst some of the query names that have fewer queries don't lose any, some queries are losing more than 1% from the top (e.g. the query with the highest count initially has a count of 152638, but when running your version, it falls to 149949, which is a drop of around 1.8%) - is this expected? </P>
<P>Also, is there a way I can still display the 100% values for certain items, i.e. Query Count, Min and Max but show the 99% Average Response as well? I'd like to show that although some queries may take longer (as outiers), the majority (99%) of searches have a low response time, but it's more of a preference.</P>Tue, 29 Aug 2017 09:13:39 GMThttps://community.splunk.com/t5/Splunk-Search/Find-the-average-of-the-first-99-range-of-data/m-p/301254#M90716sepkarimpour2017-08-29T09:13:39ZRe: Find the average of the first 99% range of data
https://community.splunk.com/t5/Splunk-Search/Find-the-average-of-the-first-99-range-of-data/m-p/301255#M90717
<P>I think so. I want to eliminate the top 1% of queries that take the longest time in order to show that all the other queries below this have a low response time - general analysis of this to show whether this'll remove the tail-end outliers that could be affecting the average OR to show that these queries have a higher response time in general so we can assess what could be causing this.</P>Tue, 29 Aug 2017 09:17:41 GMThttps://community.splunk.com/t5/Splunk-Search/Find-the-average-of-the-first-99-range-of-data/m-p/301255#M90717sepkarimpour2017-08-29T09:17:41ZRe: Find the average of the first 99% range of data
https://community.splunk.com/t5/Splunk-Search/Find-the-average-of-the-first-99-range-of-data/m-p/301256#M90718
<P>I ended up doing it so it was just the values less than or equal to the 99th percentile value for 'x':</P>
<P>index=* ... "OK" <BR />
| eventstats perc99(ResponseTime) as p99Resp by x<BR />
| where ResponseTime<=p99Resp<BR />
| stats min(ResponseTime) as p98ResponseMin, max(ResponseTime) as p98ResponseMax, avg(ResponseTime) as p98AvgRespTime by x</P>
<P>For average, I imagine using an 'eval' to get the value may be necessary.</P>Tue, 12 Sep 2017 08:52:15 GMThttps://community.splunk.com/t5/Splunk-Search/Find-the-average-of-the-first-99-range-of-data/m-p/301256#M90718sepkarimpour2017-09-12T08:52:15ZRe: Find the average of the first 99% range of data
https://community.splunk.com/t5/Splunk-Search/Find-the-average-of-the-first-99-range-of-data/m-p/301257#M90719
<P>@sepkarimpour - yes, that result is mathematically possible, especially if the top end of the data has lots of clumping. To make up a simple example, let's suppose the data was one each of the numbers from 1 to 97, then three 100s. the 99th percentile is 100, so only 97% of the data are below the 99th percentile.</P>
<P>Yes, you can do all kinds of things like that easily. I already did that for you in the last example, where P00 is the whole data, p98 is the data below the 99th, and p99 is the data in the 99th. </P>
<P>You can also use the percentile aggregate functions p25(), p50()/median(), and p75() against the p00 data if you'd like, to show the quartiles.</P>Tue, 12 Sep 2017 15:15:39 GMThttps://community.splunk.com/t5/Splunk-Search/Find-the-average-of-the-first-99-range-of-data/m-p/301257#M90719DalJeanis2017-09-12T15:15:39Z