<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Percentile Implementation in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Percentile-Implementation/m-p/106643#M27756</link>
    <description>&lt;P&gt;Thanks for the speedy response.&lt;/P&gt;</description>
    <pubDate>Tue, 03 Apr 2012 00:13:03 GMT</pubDate>
    <dc:creator>sohrab</dc:creator>
    <dc:date>2012-04-03T00:13:03Z</dc:date>
    <item>
      <title>Percentile Implementation</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Percentile-Implementation/m-p/106641#M27754</link>
      <description>&lt;P&gt;Hi&lt;/P&gt;

&lt;P&gt;I am wondering what percentile implementation does Splunk use (used by stats, etc.). It does not always return the same results as Excel's or what I calculate manually (may be interpolated).&lt;/P&gt;

&lt;P&gt;Is it the function from scipy.stats? Or it is a custom function? Is it possible to get the formula if it is custom?&lt;/P&gt;</description>
      <pubDate>Mon, 02 Apr 2012 07:26:12 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Percentile-Implementation/m-p/106641#M27754</guid>
      <dc:creator>sohrab</dc:creator>
      <dc:date>2012-04-02T07:26:12Z</dc:date>
    </item>
    <item>
      <title>Re: Percentile Implementation</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Percentile-Implementation/m-p/106642#M27755</link>
      <description>&lt;P&gt;If there are less than 1000 distinct values, the percentiles use the nearest rank algorithm (see &lt;A href="http://en.wikipedia.org/wiki/Percentile#Nearest_rank"&gt;http://en.wikipedia.org/wiki/Percentile#Nearest_rank&lt;/A&gt;).  Excel uses the NIST interpolated algorithm, which basically means you can get a value for a percentile that does not exist in the actual data, which is not possible for the nearest rank approach.  You can ask splunk to use the excel method instead via a limits.conf setting [stats] perc_method=interpolated (vs 'nearest-rank').  See the limits.conf.spec entry for more detailed info.&lt;/P&gt;

&lt;P&gt;If there are more than 1000 distinct values for the field, the percentiles are approximated using a custom radix-tree digest based algorithm that is much faster and uses much less (a constant amount) memory than an exact computation (which uses memory in linear relation to the number of distinct values).  By default this approproach limits the approximation error to &amp;lt; 1% of rank error.  That means if you ask for e.g. 95th percentile, the number you get back is between the 94th and 96th percentile.   &lt;/P&gt;

&lt;P&gt;You always get the exact percentiles even for more than 1000 distinct values by using 'exactperc' instead of 'perc'&lt;/P&gt;</description>
      <pubDate>Mon, 02 Apr 2012 23:20:25 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Percentile-Implementation/m-p/106642#M27755</guid>
      <dc:creator>steveyz</dc:creator>
      <dc:date>2012-04-02T23:20:25Z</dc:date>
    </item>
    <item>
      <title>Re: Percentile Implementation</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Percentile-Implementation/m-p/106643#M27756</link>
      <description>&lt;P&gt;Thanks for the speedy response.&lt;/P&gt;</description>
      <pubDate>Tue, 03 Apr 2012 00:13:03 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Percentile-Implementation/m-p/106643#M27756</guid>
      <dc:creator>sohrab</dc:creator>
      <dc:date>2012-04-03T00:13:03Z</dc:date>
    </item>
    <item>
      <title>Re: Percentile Implementation</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Percentile-Implementation/m-p/106644#M27757</link>
      <description>&lt;P&gt;For additional reference, in 4.3.2 you can find further details in the following files.&lt;/P&gt;

&lt;P&gt;$SPLUNK_HOME/etc/system/default/searchbnf.conf&lt;/P&gt;

&lt;P&gt;[stats-perc]&lt;BR /&gt;
syntax = (perc|p|exactperc|upperperc)&lt;INT&gt;&lt;BR /&gt;
simplesyntax = perc&lt;INT&gt;&lt;BR /&gt;
description = The n-th percentile value of this field.  perc&lt;INT&gt;, p&lt;INT&gt;, and upperperc&lt;INT&gt; give approximate values for the integer percentile requested.  The approximation algorithm we use provides a strict bound of the actual value at for any percentile.  perc&lt;INT&gt; and p&lt;INT&gt; return a single number that represents the lower end of that range while upperperc&lt;INT&gt; gives the approximate upper bound.  exactperc&lt;INT&gt; provides the exact value, but will be very expensive for high cardinality fields.&lt;/INT&gt;&lt;/INT&gt;&lt;/INT&gt;&lt;/INT&gt;&lt;/INT&gt;&lt;/INT&gt;&lt;/INT&gt;&lt;/INT&gt;&lt;/INT&gt;&lt;/P&gt;

&lt;P&gt;$SPLUNK_HOME/etc/system/README/limits.conf.spec&lt;/P&gt;

&lt;P&gt;perc_method = nearest-rank|interpolated&lt;BR /&gt;
* Which method to use for computing percentiles (and medians=50 percentile).&lt;BR /&gt;
* nearest-rank picks the number with 0-based rank R = floor((percentile/100)&lt;EM&gt;count)&lt;BR /&gt;
* interpolated means given F = (percentile/100)&lt;/EM&gt;(count-1), pick ranks R1 = floor(F) and R2 = ceiling(F).  Answer = (R2 * (F - R1)) + (R1 * (1 - (F - R1)))&lt;BR /&gt;
* See wikipedia percentile entries on nearest rank and "alternative methods" &lt;BR /&gt;
* Defaults to interpolated&lt;/P&gt;</description>
      <pubDate>Tue, 29 May 2012 17:12:15 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Percentile-Implementation/m-p/106644#M27757</guid>
      <dc:creator>Ellen</dc:creator>
      <dc:date>2012-05-29T17:12:15Z</dc:date>
    </item>
    <item>
      <title>Re: Percentile Implementation</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Percentile-Implementation/m-p/106645#M27758</link>
      <description>&lt;P&gt;@steveyz, &lt;BR /&gt;
Dear Steve, is it possible for us to get a sneak peak into rdigist algorithm or any "custom built radix tree digist algorithm" for knowledge purpose. In 6.4 we could see  by-default splunk takes "closerank" algorithm over "interpolated". &lt;/P&gt;</description>
      <pubDate>Thu, 09 Feb 2017 13:46:08 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Percentile-Implementation/m-p/106645#M27758</guid>
      <dc:creator>sundarrajan</dc:creator>
      <dc:date>2017-02-09T13:46:08Z</dc:date>
    </item>
  </channel>
</rss>

