Splunk Search

limit top to return percentage of total distinct count of a field

dadi
Path Finder

Hi guys,
I've the following problem: in my system there are events of users, and I want to get only the top 10% of the active users who appears in the logs. each log entry with the user Id counts and I need only the top 10% of the users who have the most log count.

My direction is to do it with top and limit, but I'm not sure how to calculate the value of limit:
search event=active | top userId limit=?

limit should be the 10% of total unique users ( ... | stats dc(UserId) ), but I'm not sure ho to calculate the total number of unique users in the same search query.

Any suggestions? Any idea is welcome.

Tags (3)
1 Solution

lguinn2
Legend

Don't use top - use the stats command instead

event=active
| stats count by userId
| eventstats perc90(count) as percentileLimit
| where count >= percentileLimit

The eventstats command calculates the 90th percentile of the counts by userId. Then the where command eliminates any results that have a lower count.

But now I think I might have misunderstood the question - if you have 150 users, do you only want to see the top 15? Okay, try this

event=active
| stats count by userId
| eventstats count as userIdCount
| sort -count
| eval i=1
| accum i as indexField
| eval  top10Percent=round(userIdCount / 10)
| where indexField <= top10Percent
| table indexField userId count

That should work. Let me know if it doesn't...

View solution in original post

lguinn2
Legend

Don't use top - use the stats command instead

event=active
| stats count by userId
| eventstats perc90(count) as percentileLimit
| where count >= percentileLimit

The eventstats command calculates the 90th percentile of the counts by userId. Then the where command eliminates any results that have a lower count.

But now I think I might have misunderstood the question - if you have 150 users, do you only want to see the top 15? Okay, try this

event=active
| stats count by userId
| eventstats count as userIdCount
| sort -count
| eval i=1
| accum i as indexField
| eval  top10Percent=round(userIdCount / 10)
| where indexField <= top10Percent
| table indexField userId count

That should work. Let me know if it doesn't...

dadi
Path Finder

Your second query worked like a charm. I might have learned a thing or two.
Thanks a lot!

0 Karma
Get Updates on the Splunk Community!

Splunk MCP & Agentic AI: Machine Data Without Limits

  Discover how the Splunk Model Context Protocol (MCP) Server can revolutionize the way your organization ...

Finding Based Detections General Availability

Overview  We’ve come a long way, folks, but here in Enterprise Security 8.4 I’m happy to announce Finding ...

Get Your Hands Dirty (and Your Shoes Comfy): The Splunk Experience

Hands-On Learning and Technical Seminars  Sometimes, you just need to see the code. For those looking for a ...