Splunk Search

what does perc95 and all those stats functions perc*

mataharry
Communicator

In stats calculation, I use average avg() and median but I saw other people using "percentage Xth" like perc95().
What does it do exactly ?

see docs
http://docs.splunk.com/Documentation/Splunk/5.0.4/SearchReference/Commonstatsfunctions

This function returns the X-th percentile value of the field Y, where X is an integer between 1 and 99. The functions perc, p, and upperperc give approximate values for the integer percentile requested. The approximation algorithm used provides a strict bound of the actual value at for any percentile. The functions perc and p return a single number that represents the lower end of that range while upperperc gives the approximate upper bound. exactperc provides the exact value, but will be very expensive for high cardinality fields.

Tags (1)
1 Solution

yannK
Splunk Employee
Splunk Employee

The percentile Xth function will sort the results in an increasing order.
Then considering that 0% is the lowest, and 100% the highest, pick the exact value that correspond to the position of the X% value.

To clarify, perc50() is equivalent to median(). It will pick the value in the middle of the range.

see other explanations
http://www.semaphore.com/blog/2011/04/04/95th-percentile-bandwidth-metering-explained-and-analyzed

A good example worth all the explanations :
with 10 events like "value=Y"

source=mytest | stats list(value) avg(value) median(value) perc95(value)

list of values = {10, 9, 8, 7, 6, 5, 4, 3, 2, 1}
avg(value)=5.500000
median(value)=6
perc95(value)=10

list of values = {1, 1, 1, 10, 9, 1, 1, 1, 1, 1}
avg(value)=2.700000
median(value)=1
perc95(value)=10

list of values = {10,10,10,10,10,5,5,1,1,1,1}
avg(value)=5.818182
median(value)=5
perc95(value)=10

View solution in original post

yannK
Splunk Employee
Splunk Employee

The percentile Xth function will sort the results in an increasing order.
Then considering that 0% is the lowest, and 100% the highest, pick the exact value that correspond to the position of the X% value.

To clarify, perc50() is equivalent to median(). It will pick the value in the middle of the range.

see other explanations
http://www.semaphore.com/blog/2011/04/04/95th-percentile-bandwidth-metering-explained-and-analyzed

A good example worth all the explanations :
with 10 events like "value=Y"

source=mytest | stats list(value) avg(value) median(value) perc95(value)

list of values = {10, 9, 8, 7, 6, 5, 4, 3, 2, 1}
avg(value)=5.500000
median(value)=6
perc95(value)=10

list of values = {1, 1, 1, 10, 9, 1, 1, 1, 1, 1}
avg(value)=2.700000
median(value)=1
perc95(value)=10

list of values = {10,10,10,10,10,5,5,1,1,1,1}
avg(value)=5.818182
median(value)=5
perc95(value)=10

mattymo
Splunk Employee
Splunk Employee

Thanks @yannK! Hope all is well! Time flies huh? 2013...**bleep**!

I have come from the future to add an example where I applied perc95 to application access logging -  an oft asked party trick app developers ask for. 

I  stumbled on this post while working on analyzing some service mesh logging and reading the perc95 docs. 

The year is now 2021 and I have events from a traffic gateway (Istio - think access_combined type stuff) and I receive access logging events for my "Ingress traffic". 

 

 

[2021-02-28T13:35:35.921Z] "GET /code/mattymo/docker_addon_builder/-/branches/all?sort=updated_asc HTTP/1.1" 200 - "-" "-" 0 9656 574 570 "185.191.171.6" "Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html)" "349525cc-6fff-9c55-af95-986cb31bdf70" "mattymo.io" "10.1.74.210:443" outbound|443||gitlab.gitlab.svc.cluster.local - 10.1.74.189:443 185.191.171.6:16156 mattymo.io -

 

 

This event then gets parsed to provide me many fields but the two ill use here will be "duration" and "upstream_cluster". 

in the event above, for example, "duration=574" and "upstream_cluster="outbound|443||gitlab.gitlab.svc.cluster.local"

As an app developer or performance analyst or SRE....or frankly as anyone who cares, I will invaribly want to ask Splunk to find out what my  application response times are. 

 

 

index=k8s pod="istio-ingressgateway*"
| stats count, perc50(duration) AS "Median Duration", perc95(duration) AS "95th Percentile Duration" by cluster_name, upstream_cluster
| sort - "95th Percentile Duration"

 

 

mattymo_0-1614520597142.png

This table gets me started with analyzing web traffic and the time it takes to serve my gitlab, ghost and Splunk apps! I can immediately start to drill into customer requests that take large amounts of time to serve!

Here's to  8 more years  🙂

- MattyMo
0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...