Splunk Search

## How to compute stats (mean, median, mode) over a frequency table?

Engager

I have a search which returns the result as frequency table:

``````
0           6
1           4
2           1
5           1
``````

Basically, 6 users have uploaded 0 times, 4 users uploaded 1 time, and so on.

Now I want to compute stats such as the mean, median, and mode. Is there an easier way to compute stats directly on a frequency table such as this? (ie mean should be 0.916666, and mode is 0) This seems to be something that is required quite often, yet I haven't found a way to do it.

I understand there's way to calculate stats on the data before I turn it into a frequency table, the problem is that the query is rather complex (because in order to include the `upload == 0` case, I had to do a lot of extra work). In other words, the underlying logs have no events when there are no uploads, but to calculate accurate statistics, the 0 events must be taken into consideration.

Tags (5)
1 Solution
Splunk Employee

If you're frequency table is generated by you running `stats count`, and then appending the other items, you can do it most efficiently with:

``````source=mysourcedata
| append
| eval psrsvd_v=1
``````

or, to get back the above frequency table:

``````source=mysourcedata
| append
| eval psrsvd_v=1
| stats count as frequency by uploads
``````

if you want, you can generate the subsearch into a file with outputcsv or outputlookup, then fetch it back in and append it.

Splunk Employee

If you're frequency table is generated by you running `stats count`, and then appending the other items, you can do it most efficiently with:

``````source=mysourcedata
| append
| eval psrsvd_v=1
``````

or, to get back the above frequency table:

``````source=mysourcedata
| append
| eval psrsvd_v=1
| stats count as frequency by uploads
``````

if you want, you can generate the subsearch into a file with outputcsv or outputlookup, then fetch it back in and append it.

Splunk Employee

Note the above will let you calculate mean(), stdev(), count(), and any percentile, along with a few other functions. Adding max(), min(), first(), last(), etc aren't that hard, but left as an exercise.

Splunk Employee

No easy way to compute all of these in a single simple query.

mean:

``````... | stats
sum(frequency) as totalcount
``````

mode:

``````... | sort - frequency | head 1 | eval mode=uploads
``````

median:

``````... | sort uploads
| eventstats sum(frequency) as totalcount | eval medianrank=floor((totalcount/2))
| streamstats global=f current=t window=0
sum(frequency) as maxrank
| where (maxrank-frequency <= medianrank) AND (medianrank < frequency)
``````

trickery to combine them:

``````... | sort - frequency
| eventstats
sum(frequency) as totalcount
| eval medianrank=floor((totalcount/2))
| streamstats current=t window=0
sum(frequency) as maxrank
| eval median=if((maxrank-frequency <= medianrank) AND (medianrank < frequency), uploads, null())
| where isnotnull(median)
``````
Engager

If that's the case, I'm thinking of writing my own custom command to expand the frequency table... this way I can pipe the results to stats and do mean(..), median(..), p90(..), etc. Unless something like this already exists...

Get Updates on the Splunk Community!

#### Build Scalable Security While Moving to Cloud - Guide From Clayton Homes

Clayton Homes faced the increased challenge of strengthening their security posture as they went through ...

#### Mission Control | Explore the latest release of Splunk Mission Control (2.3)

We’re happy to announce the release of Mission Control 2.3 which includes several new and exciting features ...

#### Cloud Platform | Migrating your Splunk Cloud deployment to Python 3.7

Python 2.7, the last release of Python 2, reached End of Life back on January 1, 2020. As part of our larger ...