Splunk Search

How can I group by same amount of records? (percentiles)

New Member

Hi everyone, Splunk noob here.. so any I help I would be grateful!. I've been trying to use the percX() function without success but I think it should be possible to use it to get this done.

Lets say I have a list of records with scores. I need to group them in buckets with the same amount of records on each.

I have 800 records with the field "score" (integer value) and I need to group them in groups of 20% (sorted by the score). The first bucket would be "the worst 20%", the second bucket the "second worst 20%" and so on.

At the end I need to get a table like this:

Bucket | Avg(some_field) | count
Bucket1 | X | 160 <--- The worst 20%
Bucket2 | X | 160
Bucket3 | X | 160
Bucket4 | X | 160
Bucket5 | X | 160 <--- The top 20%


0 Karma

Re: How can I group by same amount of records? (percentiles)


So, yes. 🙂

I don't have your data, so I mocked up some that's close enough using the wind speed as reported by my Ecobee Thermostat - integers like yours, or close enough I think, for a month some time last year. It should be pretty easy to modify this technique and make it fit your data exactly.

index="ecobee3" "Wind Speed _km_h">0 
| eventstats perc20("Wind Speed _km_h") AS WS20 
| eventstats perc40("Wind Speed _km_h") AS WS40 
| eventstats perc60("Wind Speed _km_h") AS WS60
| eventstats perc80("Wind Speed _km_h") AS WS80
| eval Ranking=case('Wind Speed _km_h'<=WS20,20,'Wind Speed _km_h'<=WS40,40,'Wind Speed _km_h'<=WS60,60,'Wind Speed _km_h'<=WS80,80,1==1,100)
| stats count by Ranking | sort + Ranking

First, in line one I search my index and also for where "Wind Speed kmh" is greater than zero, simply because it's often zero. You don't have to strip that out if you don't want.

The next 4 lines add to all events the 4 values for perc20, perc40, perc60 and perc80 of my wind speeds (Called WS20 etc...) by using eventstats. If you just run it up to that point in the search, each event will have a WS20, WS40 and so on, and they should ascend reasonably - in my case they're, I think 7, 11, 14, and 16? That makes sense for wind speeds if I don't count zeros.

Then in line 6, I create a new field called Ranking that puts each event in a ranking system from 20 to 100 (by 20's) based on which WSXX it's in. This uses an eval case.

The last line(s) is a stats count by Ranking then resorting it because, well, it wasn't sorted. Now it is!

So, see if you can make that work for you, and if not (or it's almost but not quite perfect) reply back with what else you need or some sample events!