Splunk Search

what does perc95 and all those stats functions perc*

mataharry
Communicator

In stats calculation, I use average avg() and median but I saw other people using "percentage Xth" like perc95().
What does it do exactly ?

see docs
http://docs.splunk.com/Documentation/Splunk/5.0.4/SearchReference/Commonstatsfunctions

This function returns the X-th percentile value of the field Y, where X is an integer between 1 and 99. The functions perc, p, and upperperc give approximate values for the integer percentile requested. The approximation algorithm used provides a strict bound of the actual value at for any percentile. The functions perc and p return a single number that represents the lower end of that range while upperperc gives the approximate upper bound. exactperc provides the exact value, but will be very expensive for high cardinality fields.

Tags (1)
1 Solution

yannK
Splunk Employee
Splunk Employee

The percentile Xth function will sort the results in an increasing order.
Then considering that 0% is the lowest, and 100% the highest, pick the exact value that correspond to the position of the X% value.

To clarify, perc50() is equivalent to median(). It will pick the value in the middle of the range.

see other explanations
http://www.semaphore.com/blog/2011/04/04/95th-percentile-bandwidth-metering-explained-and-analyzed

A good example worth all the explanations :
with 10 events like "value=Y"

source=mytest | stats list(value) avg(value) median(value) perc95(value)

list of values = {10, 9, 8, 7, 6, 5, 4, 3, 2, 1}
avg(value)=5.500000
median(value)=6
perc95(value)=10

list of values = {1, 1, 1, 10, 9, 1, 1, 1, 1, 1}
avg(value)=2.700000
median(value)=1
perc95(value)=10

list of values = {10,10,10,10,10,5,5,1,1,1,1}
avg(value)=5.818182
median(value)=5
perc95(value)=10

View solution in original post

yannK
Splunk Employee
Splunk Employee

The percentile Xth function will sort the results in an increasing order.
Then considering that 0% is the lowest, and 100% the highest, pick the exact value that correspond to the position of the X% value.

To clarify, perc50() is equivalent to median(). It will pick the value in the middle of the range.

see other explanations
http://www.semaphore.com/blog/2011/04/04/95th-percentile-bandwidth-metering-explained-and-analyzed

A good example worth all the explanations :
with 10 events like "value=Y"

source=mytest | stats list(value) avg(value) median(value) perc95(value)

list of values = {10, 9, 8, 7, 6, 5, 4, 3, 2, 1}
avg(value)=5.500000
median(value)=6
perc95(value)=10

list of values = {1, 1, 1, 10, 9, 1, 1, 1, 1, 1}
avg(value)=2.700000
median(value)=1
perc95(value)=10

list of values = {10,10,10,10,10,5,5,1,1,1,1}
avg(value)=5.818182
median(value)=5
perc95(value)=10

mattymo
Splunk Employee
Splunk Employee

Thanks @yannK! Hope all is well! Time flies huh? 2013...**bleep**!

I have come from the future to add an example where I applied perc95 to application access logging -  an oft asked party trick app developers ask for. 

I  stumbled on this post while working on analyzing some service mesh logging and reading the perc95 docs. 

The year is now 2021 and I have events from a traffic gateway (Istio - think access_combined type stuff) and I receive access logging events for my "Ingress traffic". 

 

 

[2021-02-28T13:35:35.921Z] "GET /code/mattymo/docker_addon_builder/-/branches/all?sort=updated_asc HTTP/1.1" 200 - "-" "-" 0 9656 574 570 "185.191.171.6" "Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html)" "349525cc-6fff-9c55-af95-986cb31bdf70" "mattymo.io" "10.1.74.210:443" outbound|443||gitlab.gitlab.svc.cluster.local - 10.1.74.189:443 185.191.171.6:16156 mattymo.io -

 

 

This event then gets parsed to provide me many fields but the two ill use here will be "duration" and "upstream_cluster". 

in the event above, for example, "duration=574" and "upstream_cluster="outbound|443||gitlab.gitlab.svc.cluster.local"

As an app developer or performance analyst or SRE....or frankly as anyone who cares, I will invaribly want to ask Splunk to find out what my  application response times are. 

 

 

index=k8s pod="istio-ingressgateway*"
| stats count, perc50(duration) AS "Median Duration", perc95(duration) AS "95th Percentile Duration" by cluster_name, upstream_cluster
| sort - "95th Percentile Duration"

 

 

mattymo_0-1614520597142.png

This table gets me started with analyzing web traffic and the time it takes to serve my gitlab, ghost and Splunk apps! I can immediately start to drill into customer requests that take large amounts of time to serve!

Here's to  8 more years  🙂

- MattyMo
0 Karma
Get Updates on the Splunk Community!

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

A few months ago, we released Splunk Enterprise Security 8.0 for our cloud customers. Today, we are excited to ...

Logs to Metrics

Logs and Metrics Logs are generally unstructured text or structured events emitted by applications and written ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...