Splunk Search

limit top to return percentage of total distinct count of a field

dadi
Path Finder

Hi guys,
I've the following problem: in my system there are events of users, and I want to get only the top 10% of the active users who appears in the logs. each log entry with the user Id counts and I need only the top 10% of the users who have the most log count.

My direction is to do it with top and limit, but I'm not sure how to calculate the value of limit:
search event=active | top userId limit=?

limit should be the 10% of total unique users ( ... | stats dc(UserId) ), but I'm not sure ho to calculate the total number of unique users in the same search query.

Any suggestions? Any idea is welcome.

Tags (3)
1 Solution

lguinn2
Legend

Don't use top - use the stats command instead

event=active
| stats count by userId
| eventstats perc90(count) as percentileLimit
| where count >= percentileLimit

The eventstats command calculates the 90th percentile of the counts by userId. Then the where command eliminates any results that have a lower count.

But now I think I might have misunderstood the question - if you have 150 users, do you only want to see the top 15? Okay, try this

event=active
| stats count by userId
| eventstats count as userIdCount
| sort -count
| eval i=1
| accum i as indexField
| eval  top10Percent=round(userIdCount / 10)
| where indexField <= top10Percent
| table indexField userId count

That should work. Let me know if it doesn't...

View solution in original post

lguinn2
Legend

Don't use top - use the stats command instead

event=active
| stats count by userId
| eventstats perc90(count) as percentileLimit
| where count >= percentileLimit

The eventstats command calculates the 90th percentile of the counts by userId. Then the where command eliminates any results that have a lower count.

But now I think I might have misunderstood the question - if you have 150 users, do you only want to see the top 15? Okay, try this

event=active
| stats count by userId
| eventstats count as userIdCount
| sort -count
| eval i=1
| accum i as indexField
| eval  top10Percent=round(userIdCount / 10)
| where indexField <= top10Percent
| table indexField userId count

That should work. Let me know if it doesn't...

dadi
Path Finder

Your second query worked like a charm. I might have learned a thing or two.
Thanks a lot!

0 Karma
Get Updates on the Splunk Community!

Why You Can't Miss .conf25: Unleashing the Power of Agentic AI with Splunk & Cisco

The Defining Technology Movement of Our Lifetime The advent of agentic AI is arguably the defining technology ...

Deep Dive into Federated Analytics: Unlocking the Full Power of Your Security Data

In today’s complex digital landscape, security teams face increasing pressure to protect sprawling data across ...

Your summer travels continue with new course releases

Summer in the Northern hemisphere is in full swing, and is often a time to travel and explore. If your summer ...