Splunk Search

How to group by percentile?

gtran
New Member

Hi,
I'm new to Splunk and I'm quite stuck on how to group users by percentile.

Each user has the option of paying for services and I want to group these users by their payment percentile. So if the max anyone has cumulatively paid is $100, they would show up in the 99th percentile while the 50th percentile would be someone who paid $50 or more. Based on that grouping I want to know the average number of sessions they've logged in my app.

It'd look something like this:



























Percentile Avg Payment Sessions
99% 99 50
80% 82 44
50% 60 25
0% 0 60

I would be grateful for any help on how to achieve this. Thanks!

0 Karma
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

Here's what you do, assuming your data has an event per session with a user ID and the amount paid for that session:

| inputlookup sessions.csv
| stats count as sessions sum(amount) as total by user
| eventstats p99(total) as p99 p80(total) as p80 p50(total) as p50
| eval Percentile = case(total >= p99, "99%", total >= p80, "80%", total >= p50, "50%", 1=1, "0%")
| stats avg(total) as average avg(sessions) as sessions by Percentile

The inputlookup loads my dummy data, see below.
The stats calculates the number of sessions and total amount paid per user.
The eventstats uses those totals to calculate where the percentile borders are.
The eval groups the users into their percentile buckets, without duplication - if you're in the 99% bucket you're not also in all the other buckets.
The final stats computes the average total amount and average number of sessions per percentile bucket.

Sample data looks like this:

session,user,amount
1,a,10
2,a,15
3,a,20
4,b,0
5,c,100
6,d,10
7,e,20
8,e,10
9,f,1
10,g,5
11,h,6
12,i,7
13,j,8
14,k,9
15,l,10
16,m,11
17,n,12
18,o,13
19,p,14
20,q,15

Output based on that sample data:

Percentile average sessions
0% 5.142857 1
50% 11.666667 1
80% 30.000000 2
99% 100.000000 1

To cross-check, let's look at the 80% bucket. I've got 17 users, so the lower-bound 80% bucket should be the top four people. The top one user is already in the 99% bucket, so we're looking at 2, 3, and 4 - they're a, e, and q and spent 45, 30, and 15 respectively giving you an average of 30.

View solution in original post

martin_mueller
SplunkTrust
SplunkTrust

Here's what you do, assuming your data has an event per session with a user ID and the amount paid for that session:

| inputlookup sessions.csv
| stats count as sessions sum(amount) as total by user
| eventstats p99(total) as p99 p80(total) as p80 p50(total) as p50
| eval Percentile = case(total >= p99, "99%", total >= p80, "80%", total >= p50, "50%", 1=1, "0%")
| stats avg(total) as average avg(sessions) as sessions by Percentile

The inputlookup loads my dummy data, see below.
The stats calculates the number of sessions and total amount paid per user.
The eventstats uses those totals to calculate where the percentile borders are.
The eval groups the users into their percentile buckets, without duplication - if you're in the 99% bucket you're not also in all the other buckets.
The final stats computes the average total amount and average number of sessions per percentile bucket.

Sample data looks like this:

session,user,amount
1,a,10
2,a,15
3,a,20
4,b,0
5,c,100
6,d,10
7,e,20
8,e,10
9,f,1
10,g,5
11,h,6
12,i,7
13,j,8
14,k,9
15,l,10
16,m,11
17,n,12
18,o,13
19,p,14
20,q,15

Output based on that sample data:

Percentile average sessions
0% 5.142857 1
50% 11.666667 1
80% 30.000000 2
99% 100.000000 1

To cross-check, let's look at the 80% bucket. I've got 17 users, so the lower-bound 80% bucket should be the top four people. The top one user is already in the 99% bucket, so we're looking at 2, 3, and 4 - they're a, e, and q and spent 45, 30, and 15 respectively giving you an average of 30.

martin_mueller
SplunkTrust
SplunkTrust

That yields this table for my dummy data:

Percentile average sessions
99% 100 1
80% 47.500000 1.750000
50% 26 1.300000
0% 17.411765 1.176471

As a cross-check, running this:

| inputlookup sessions.csv
| stats count as sessions sum(amount) as total by user
| stats avg(total)

yields an average total per user of 17.411765, just as predicted by the 0% value 🙂

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Sure, it just adds a bit of post-processing:

| inputlookup sessions.csv
| stats count as sessions sum(amount) as total by user
| eventstats p99(total) as p99 p80(total) as p80 p50(total) as p50
| eval Percentile = case(total >= p99, "99%", total >= p80, "80%", total >= p50, "50%", 1=1, "0%")
| stats count sum(total) as total sum(sessions) as sessions by Percentile
| sort - Percentile
| streamstats sum(*) as running_*
| eval average = running_total / running_count
| eval sessions = running_sessions / running_count
| table Percentile average sessions
0 Karma

gtran
New Member

Thanks so much for this explanation. I think my biggest mistake was using stats to calculate the actual percentile values instead of eventstats. Is there any way for a user to be included in multiple groups? So the top user would be in the 99%, 80% and 50%?

0 Karma

somesoni2
Revered Legend

Could you post some sample events?

0 Karma

linu1988
Champion

will it be

....|eval percentile=case(Payment=="50$" AND Payment=="100$","50%",Payment="100$","100%",cases..)|stats count(Sessions) as Session by percentile,Payment

0 Karma

gtran
New Member

It is a bit confusing... The top 50% would be both the person who paid $50 and the person who paid $100 while the 99th percentile is the top 1% so just the guy who paid $100.

I have not looked into rangemap yet, thanks for the tip!

0 Karma

linu1988
Champion

so does that mean 99% is the person who has paid 100$ and the one paid 50$ they are in 50%? little confusing. You could try

eval per=case

or

rangemap

i guess that is the case here.

0 Karma
Get Updates on the Splunk Community!

Updated Team Landing Page in Splunk Observability

We’re making some changes to the team landing page in Splunk Observability, based on your feedback. The ...

New! Splunk Observability Search Enhancements for Splunk APM Services/Traces and ...

Regardless of where you are in Splunk Observability, you can search for relevant APM targets including service ...

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...