Splunk Search

## How to compute stats (mean, median, mode) over a frequency table?

Engager

I have a search which returns the result as frequency table:

``````
0           6
1           4
2           1
5           1
``````

Basically, 6 users have uploaded 0 times, 4 users uploaded 1 time, and so on.

Now I want to compute stats such as the mean, median, and mode. Is there an easier way to compute stats directly on a frequency table such as this? (ie mean should be 0.916666, and mode is 0) This seems to be something that is required quite often, yet I haven't found a way to do it.

I understand there's way to calculate stats on the data before I turn it into a frequency table, the problem is that the query is rather complex (because in order to include the `upload == 0` case, I had to do a lot of extra work). In other words, the underlying logs have no events when there are no uploads, but to calculate accurate statistics, the 0 events must be taken into consideration.

Tags (5)
1 Solution Splunk Employee

If you're frequency table is generated by you running `stats count`, and then appending the other items, you can do it most efficiently with:

``````source=mysourcedata
| append
| eval psrsvd_v=1
``````

or, to get back the above frequency table:

``````source=mysourcedata
| append
| eval psrsvd_v=1
| stats count as frequency by uploads
``````

if you want, you can generate the subsearch into a file with outputcsv or outputlookup, then fetch it back in and append it. Splunk Employee

If you're frequency table is generated by you running `stats count`, and then appending the other items, you can do it most efficiently with:

``````source=mysourcedata
| append
| eval psrsvd_v=1
``````

or, to get back the above frequency table:

``````source=mysourcedata
| append
| eval psrsvd_v=1
| stats count as frequency by uploads
``````

if you want, you can generate the subsearch into a file with outputcsv or outputlookup, then fetch it back in and append it. Splunk Employee

Note the above will let you calculate mean(), stdev(), count(), and any percentile, along with a few other functions. Adding max(), min(), first(), last(), etc aren't that hard, but left as an exercise. Splunk Employee

No easy way to compute all of these in a single simple query.

mean:

``````... | stats
sum(frequency) as totalcount
``````

mode:

``````... | sort - frequency | head 1 | eval mode=uploads
``````

median:

``````... | sort uploads
| eventstats sum(frequency) as totalcount | eval medianrank=floor((totalcount/2))
| streamstats global=f current=t window=0
sum(frequency) as maxrank
| where (maxrank-frequency <= medianrank) AND (medianrank < frequency)
``````

trickery to combine them:

``````... | sort - frequency
| eventstats
sum(frequency) as totalcount  