In stats calculation, I use average avg() and median but I saw other people using "percentage Xth" like perc95().
What does it do exactly ?
see docs
http://docs.splunk.com/Documentation/Splunk/5.0.4/SearchReference/Commonstatsfunctions
This function returns the X-th percentile value of the field Y, where X is an integer between 1 and 99. The functions perc, p, and upperperc give approximate values for the integer percentile requested. The approximation algorithm used provides a strict bound of the actual value at for any percentile. The functions perc and p return a single number that represents the lower end of that range while upperperc gives the approximate upper bound. exactperc provides the exact value, but will be very expensive for high cardinality fields.
The percentile Xth function will sort the results in an increasing order.
Then considering that 0% is the lowest, and 100% the highest, pick the exact value that correspond to the position of the X% value.
To clarify, perc50() is equivalent to median(). It will pick the value in the middle of the range.
see other explanations
http://www.semaphore.com/blog/2011/04/04/95th-percentile-bandwidth-metering-explained-and-analyzed
A good example worth all the explanations :
with 10 events like "value=Y"
source=mytest | stats list(value) avg(value) median(value) perc95(value)
list of values = {10, 9, 8, 7, 6, 5, 4, 3, 2, 1}
avg(value)=5.500000
median(value)=6
perc95(value)=10
list of values = {1, 1, 1, 10, 9, 1, 1, 1, 1, 1}
avg(value)=2.700000
median(value)=1
perc95(value)=10
list of values = {10,10,10,10,10,5,5,1,1,1,1}
avg(value)=5.818182
median(value)=5
perc95(value)=10
The percentile Xth function will sort the results in an increasing order.
Then considering that 0% is the lowest, and 100% the highest, pick the exact value that correspond to the position of the X% value.
To clarify, perc50() is equivalent to median(). It will pick the value in the middle of the range.
see other explanations
http://www.semaphore.com/blog/2011/04/04/95th-percentile-bandwidth-metering-explained-and-analyzed
A good example worth all the explanations :
with 10 events like "value=Y"
source=mytest | stats list(value) avg(value) median(value) perc95(value)
list of values = {10, 9, 8, 7, 6, 5, 4, 3, 2, 1}
avg(value)=5.500000
median(value)=6
perc95(value)=10
list of values = {1, 1, 1, 10, 9, 1, 1, 1, 1, 1}
avg(value)=2.700000
median(value)=1
perc95(value)=10
list of values = {10,10,10,10,10,5,5,1,1,1,1}
avg(value)=5.818182
median(value)=5
perc95(value)=10
Thanks @yannK! Hope all is well! Time flies huh? 2013...**bleep**!
I have come from the future to add an example where I applied perc95 to application access logging - an oft asked party trick app developers ask for.
I stumbled on this post while working on analyzing some service mesh logging and reading the perc95 docs.
The year is now 2021 and I have events from a traffic gateway (Istio - think access_combined type stuff) and I receive access logging events for my "Ingress traffic".
[2021-02-28T13:35:35.921Z] "GET /code/mattymo/docker_addon_builder/-/branches/all?sort=updated_asc HTTP/1.1" 200 - "-" "-" 0 9656 574 570 "185.191.171.6" "Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html)" "349525cc-6fff-9c55-af95-986cb31bdf70" "mattymo.io" "10.1.74.210:443" outbound|443||gitlab.gitlab.svc.cluster.local - 10.1.74.189:443 185.191.171.6:16156 mattymo.io -
This event then gets parsed to provide me many fields but the two ill use here will be "duration" and "upstream_cluster".
in the event above, for example, "duration=574" and "upstream_cluster="outbound|443||gitlab.gitlab.svc.cluster.local"
As an app developer or performance analyst or SRE....or frankly as anyone who cares, I will invaribly want to ask Splunk to find out what my application response times are.
index=k8s pod="istio-ingressgateway*"
| stats count, perc50(duration) AS "Median Duration", perc95(duration) AS "95th Percentile Duration" by cluster_name, upstream_cluster
| sort - "95th Percentile Duration"
This table gets me started with analyzing web traffic and the time it takes to serve my gitlab, ghost and Splunk apps! I can immediately start to drill into customer requests that take large amounts of time to serve!
Here's to 8 more years 🙂