At the current moment, as per this documentation, there is not an already implemented geometric mean in Splunk.
However, to get the geometric mean of our_field
one could do:
... | eval natural_logs = ln(our_field)
| stats mean(natural_logs) as log_mean
| eval geometric_mean = exp(log_mean)
If we think for a moment about what the geometric mean really is, that being the nth root of the product of n numbers:
we could express this in terms of logarithms, since multiplication becomes a sum and the power becomes multiplication:
The right-hand side formula above is generally the preferred alternative for implementation in computer languages. This is because calculating the product of many numbers can lead to an arithmetic overflow or arithmetic underflow. This is less likely to occur when you first take the logarithm of each number and sum these.
So in Splunk, if we work backwards, we can hypothetically
1.) Take natural log with eval
function ln()
2.) stats mean
3.) Take the exponential function with eval
function exp()
A second approach would be to use the R app for Splunk.
1.) Download the app
2.) Add the path to your R bin in $SPLUNK_HOME/etc/apps/r/default/r.conf
e.g. r=/usr/bin/R
3.) Pipe to R in your search command like this:
| table some_field
| r "exp(mean(log(data.matrix(input)))) -> output"
Here is a slightly more complicated example:
sourcetype=ps earliest=-4m
| multikv fields RSZ_KB
| search RSZ_KB > 0 AND VSZ_KB > 0
| table RSZ_KB VSZ_KB
| r "
gm_mean = function(x, na.rm=TRUE){
exp(sum(log(x[x > 0]), na.rm=na.rm) / length(x))
}
data <- data.matrix(input);
output <- apply(data, 2, gm_mean)"
provides
x
132.902175678696
34188.4285350717
Thanks for this as well, for more and more stats capabilities we'll be using R as well, so thanks for pointing this out as well.
At the current moment, as per this documentation, there is not an already implemented geometric mean in Splunk.
However, to get the geometric mean of our_field
one could do:
... | eval natural_logs = ln(our_field)
| stats mean(natural_logs) as log_mean
| eval geometric_mean = exp(log_mean)
If we think for a moment about what the geometric mean really is, that being the nth root of the product of n numbers:
we could express this in terms of logarithms, since multiplication becomes a sum and the power becomes multiplication:
The right-hand side formula above is generally the preferred alternative for implementation in computer languages. This is because calculating the product of many numbers can lead to an arithmetic overflow or arithmetic underflow. This is less likely to occur when you first take the logarithm of each number and sum these.
So in Splunk, if we work backwards, we can hypothetically
1.) Take natural log with eval
function ln()
2.) stats mean
3.) Take the exponential function with eval
function exp()
Please correct me if I'm wrong too, I was just trying to brainstorm a way...
That's really good, it would be good to have this implemented. Geometric mean is good to have comparing ratios. Thanks for the help I'll submit a request.