<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to group by percentile? in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/How-to-group-by-percentile/m-p/131115#M35763</link>
    <description>&lt;P&gt;Sure, it just adds a bit of post-processing:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| inputlookup sessions.csv
| stats count as sessions sum(amount) as total by user
| eventstats p99(total) as p99 p80(total) as p80 p50(total) as p50
| eval Percentile = case(total &amp;gt;= p99, "99%", total &amp;gt;= p80, "80%", total &amp;gt;= p50, "50%", 1=1, "0%")
| stats count sum(total) as total sum(sessions) as sessions by Percentile
| sort - Percentile
| streamstats sum(*) as running_*
| eval average = running_total / running_count
| eval sessions = running_sessions / running_count
| table Percentile average sessions
&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Mon, 14 Apr 2014 20:06:49 GMT</pubDate>
    <dc:creator>martin_mueller</dc:creator>
    <dc:date>2014-04-14T20:06:49Z</dc:date>
    <item>
      <title>How to group by percentile?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-group-by-percentile/m-p/131108#M35756</link>
      <description>&lt;P&gt;Hi,&lt;BR /&gt;
I'm new to Splunk and I'm quite stuck on how to group users by percentile. &lt;/P&gt;

&lt;P&gt;Each user has the option of paying for services and I want to group these users by their payment percentile. So if the max anyone has cumulatively paid is $100, they would show up in the 99th percentile while the 50th percentile would be someone who paid $50 or more. Based on that grouping I want to know the average number of sessions they've logged in my app.&lt;/P&gt;

&lt;P&gt;It'd look something like this:&lt;BR /&gt;
&lt;TABLE border="1"&gt;&lt;BR /&gt;
&lt;TBODY&gt;&lt;TR&gt;&lt;BR /&gt;
&lt;TD&gt;Percentile&lt;/TD&gt;&lt;BR /&gt;
&lt;TD&gt;Avg Payment&lt;/TD&gt;&lt;BR /&gt;
&lt;TD&gt;Sessions&lt;/TD&gt;&lt;BR /&gt;
&lt;/TR&gt;&lt;BR /&gt;
&lt;TR&gt;&lt;BR /&gt;
&lt;TD&gt;99%&lt;/TD&gt;&lt;BR /&gt;
&lt;TD&gt;99&lt;/TD&gt;&lt;BR /&gt;
&lt;TD&gt;50&lt;/TD&gt;&lt;BR /&gt;
&lt;/TR&gt;&lt;BR /&gt;
&lt;TR&gt;&lt;BR /&gt;
&lt;TD&gt;80%&lt;/TD&gt;&lt;BR /&gt;
&lt;TD&gt;82&lt;/TD&gt;&lt;BR /&gt;
&lt;TD&gt;44&lt;/TD&gt;&lt;BR /&gt;
&lt;/TR&gt;&lt;BR /&gt;
&lt;TR&gt;&lt;BR /&gt;
&lt;TD&gt;50%&lt;/TD&gt;&lt;BR /&gt;
&lt;TD&gt;60&lt;/TD&gt;&lt;BR /&gt;
&lt;TD&gt;25&lt;/TD&gt;&lt;BR /&gt;
&lt;/TR&gt;&lt;BR /&gt;
&lt;TR&gt;&lt;BR /&gt;
&lt;TD&gt;0%&lt;/TD&gt;&lt;BR /&gt;
&lt;TD&gt;0&lt;/TD&gt;&lt;BR /&gt;
&lt;TD&gt;60&lt;/TD&gt;&lt;BR /&gt;
&lt;/TR&gt;&lt;BR /&gt;
&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;/P&gt;

&lt;P&gt;I would be grateful for any help on how to achieve this. Thanks!&lt;/P&gt;</description>
      <pubDate>Mon, 14 Apr 2014 18:15:51 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-group-by-percentile/m-p/131108#M35756</guid>
      <dc:creator>gtran</dc:creator>
      <dc:date>2014-04-14T18:15:51Z</dc:date>
    </item>
    <item>
      <title>Re: How to group by percentile?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-group-by-percentile/m-p/131109#M35757</link>
      <description>&lt;P&gt;so does that mean 99% is the person who has paid 100$ and the one paid 50$ they are in 50%? little confusing. You could try &lt;/P&gt;

&lt;P&gt;eval per=case&lt;/P&gt;

&lt;P&gt;or&lt;/P&gt;

&lt;P&gt;rangemap&lt;/P&gt;

&lt;P&gt;i guess that is the case here.&lt;/P&gt;</description>
      <pubDate>Mon, 14 Apr 2014 18:27:21 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-group-by-percentile/m-p/131109#M35757</guid>
      <dc:creator>linu1988</dc:creator>
      <dc:date>2014-04-14T18:27:21Z</dc:date>
    </item>
    <item>
      <title>Re: How to group by percentile?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-group-by-percentile/m-p/131110#M35758</link>
      <description>&lt;P&gt;It is a bit confusing... The top 50% would be both the person who paid $50 and the person who paid $100 while the 99th percentile is the top 1% so just the guy who paid $100.&lt;/P&gt;

&lt;P&gt;I have not looked into rangemap yet, thanks for the tip!&lt;/P&gt;</description>
      <pubDate>Mon, 14 Apr 2014 18:31:45 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-group-by-percentile/m-p/131110#M35758</guid>
      <dc:creator>gtran</dc:creator>
      <dc:date>2014-04-14T18:31:45Z</dc:date>
    </item>
    <item>
      <title>Re: How to group by percentile?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-group-by-percentile/m-p/131111#M35759</link>
      <description>&lt;P&gt;will it be&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;....|eval percentile=case(Payment=="50$" AND Payment=="100$","50%",Payment="100$","100%",cases..)|stats count(Sessions) as Session by percentile,Payment&lt;/CODE&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 14 Apr 2014 18:38:28 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-group-by-percentile/m-p/131111#M35759</guid>
      <dc:creator>linu1988</dc:creator>
      <dc:date>2014-04-14T18:38:28Z</dc:date>
    </item>
    <item>
      <title>Re: How to group by percentile?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-group-by-percentile/m-p/131112#M35760</link>
      <description>&lt;P&gt;Could you post some sample events?&lt;/P&gt;</description>
      <pubDate>Mon, 14 Apr 2014 19:37:38 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-group-by-percentile/m-p/131112#M35760</guid>
      <dc:creator>somesoni2</dc:creator>
      <dc:date>2014-04-14T19:37:38Z</dc:date>
    </item>
    <item>
      <title>Re: How to group by percentile?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-group-by-percentile/m-p/131113#M35761</link>
      <description>&lt;P&gt;Here's what you do, assuming your data has an event per session with a user ID and the amount paid for that session:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| inputlookup sessions.csv
| stats count as sessions sum(amount) as total by user
| eventstats p99(total) as p99 p80(total) as p80 p50(total) as p50
| eval Percentile = case(total &amp;gt;= p99, "99%", total &amp;gt;= p80, "80%", total &amp;gt;= p50, "50%", 1=1, "0%")
| stats avg(total) as average avg(sessions) as sessions by Percentile
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The inputlookup loads my dummy data, see below.&lt;BR /&gt;
The stats calculates the number of sessions and total amount paid per user.&lt;BR /&gt;
The eventstats uses those totals to calculate where the percentile borders are.&lt;BR /&gt;
The eval groups the users into their percentile buckets, without duplication - if you're in the 99% bucket you're not also in all the other buckets.&lt;BR /&gt;
The final stats computes the average total amount and average number of sessions per percentile bucket.&lt;/P&gt;

&lt;P&gt;Sample data looks like this:&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;session,user,amount&lt;BR /&gt;
1,a,10&lt;BR /&gt;
2,a,15&lt;BR /&gt;
3,a,20&lt;BR /&gt;
4,b,0&lt;BR /&gt;
5,c,100&lt;BR /&gt;
6,d,10&lt;BR /&gt;
7,e,20&lt;BR /&gt;
8,e,10&lt;BR /&gt;
9,f,1&lt;BR /&gt;
10,g,5&lt;BR /&gt;
11,h,6&lt;BR /&gt;
12,i,7&lt;BR /&gt;
13,j,8&lt;BR /&gt;
14,k,9&lt;BR /&gt;
15,l,10&lt;BR /&gt;
16,m,11&lt;BR /&gt;
17,n,12&lt;BR /&gt;
18,o,13&lt;BR /&gt;
19,p,14&lt;BR /&gt;
20,q,15&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;Output based on that sample data:&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;Percentile    average  sessions&lt;BR /&gt;
0%           5.142857         1&lt;BR /&gt;
50%         11.666667         1&lt;BR /&gt;
80%         30.000000         2&lt;BR /&gt;
99%        100.000000         1&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;To cross-check, let's look at the 80% bucket. I've got 17 users, so the lower-bound 80% bucket should be the top four people. The top one user is already in the 99% bucket, so we're looking at 2, 3, and 4 - they're a, e, and q and spent 45, 30, and 15 respectively giving you an average of 30.&lt;/P&gt;</description>
      <pubDate>Mon, 14 Apr 2014 19:43:06 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-group-by-percentile/m-p/131113#M35761</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2014-04-14T19:43:06Z</dc:date>
    </item>
    <item>
      <title>Re: How to group by percentile?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-group-by-percentile/m-p/131114#M35762</link>
      <description>&lt;P&gt;Thanks so much for this explanation. I think my biggest mistake was using stats to calculate the actual percentile values instead of eventstats. Is there any way for a user to be included in multiple groups? So the top user would be in the 99%, 80% and 50%?&lt;/P&gt;</description>
      <pubDate>Mon, 14 Apr 2014 19:58:26 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-group-by-percentile/m-p/131114#M35762</guid>
      <dc:creator>gtran</dc:creator>
      <dc:date>2014-04-14T19:58:26Z</dc:date>
    </item>
    <item>
      <title>Re: How to group by percentile?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-group-by-percentile/m-p/131115#M35763</link>
      <description>&lt;P&gt;Sure, it just adds a bit of post-processing:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| inputlookup sessions.csv
| stats count as sessions sum(amount) as total by user
| eventstats p99(total) as p99 p80(total) as p80 p50(total) as p50
| eval Percentile = case(total &amp;gt;= p99, "99%", total &amp;gt;= p80, "80%", total &amp;gt;= p50, "50%", 1=1, "0%")
| stats count sum(total) as total sum(sessions) as sessions by Percentile
| sort - Percentile
| streamstats sum(*) as running_*
| eval average = running_total / running_count
| eval sessions = running_sessions / running_count
| table Percentile average sessions
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 14 Apr 2014 20:06:49 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-group-by-percentile/m-p/131115#M35763</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2014-04-14T20:06:49Z</dc:date>
    </item>
    <item>
      <title>Re: How to group by percentile?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-group-by-percentile/m-p/131116#M35764</link>
      <description>&lt;P&gt;That yields this table for my dummy data:&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;Percentile    average  sessions&lt;BR /&gt;
99%               100          1&lt;BR /&gt;
80%         47.500000   1.750000&lt;BR /&gt;
50%                26   1.300000&lt;BR /&gt;
0%          17.411765   1.176471&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;As a cross-check, running this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| inputlookup sessions.csv
| stats count as sessions sum(amount) as total by user
| stats avg(total)
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;yields an average total per user of 17.411765, just as predicted by the 0% value &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 14 Apr 2014 20:08:25 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-group-by-percentile/m-p/131116#M35764</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2014-04-14T20:08:25Z</dc:date>
    </item>
  </channel>
</rss>

