Splunk Search

## How to compute stats (mean, median, mode) over a frequency table?

Engager

I have a search which returns the result as frequency table:

``````
0           6
1           4
2           1
5           1
``````

Basically, 6 users have uploaded 0 times, 4 users uploaded 1 time, and so on.

Now I want to compute stats such as the mean, median, and mode. Is there an easier way to compute stats directly on a frequency table such as this? (ie mean should be 0.916666, and mode is 0) This seems to be something that is required quite often, yet I haven't found a way to do it.

I understand there's way to calculate stats on the data before I turn it into a frequency table, the problem is that the query is rather complex (because in order to include the `upload == 0` case, I had to do a lot of extra work). In other words, the underlying logs have no events when there are no uploads, but to calculate accurate statistics, the 0 events must be taken into consideration.

Tags (5)
1 Solution
Splunk Employee

If you're frequency table is generated by you running `stats count`, and then appending the other items, you can do it most efficiently with:

``````source=mysourcedata
| append
| eval psrsvd_v=1
``````

or, to get back the above frequency table:

``````source=mysourcedata
| append
| eval psrsvd_v=1
| stats count as frequency by uploads
``````

if you want, you can generate the subsearch into a file with outputcsv or outputlookup, then fetch it back in and append it.

Splunk Employee

If you're frequency table is generated by you running `stats count`, and then appending the other items, you can do it most efficiently with:

``````source=mysourcedata
| append
| eval psrsvd_v=1
``````

or, to get back the above frequency table:

``````source=mysourcedata
| append
| eval psrsvd_v=1
| stats count as frequency by uploads
``````

if you want, you can generate the subsearch into a file with outputcsv or outputlookup, then fetch it back in and append it.

Splunk Employee

Note the above will let you calculate mean(), stdev(), count(), and any percentile, along with a few other functions. Adding max(), min(), first(), last(), etc aren't that hard, but left as an exercise.

Splunk Employee

No easy way to compute all of these in a single simple query.

mean:

``````... | stats
sum(frequency) as totalcount
``````

mode:

``````... | sort - frequency | head 1 | eval mode=uploads
``````

median:

``````... | sort uploads
| eventstats sum(frequency) as totalcount | eval medianrank=floor((totalcount/2))
| streamstats global=f current=t window=0
sum(frequency) as maxrank
| where (maxrank-frequency <= medianrank) AND (medianrank < frequency)
``````

trickery to combine them:

``````... | sort - frequency
| eventstats
sum(frequency) as totalcount
| eval medianrank=floor((totalcount/2))
| streamstats current=t window=0
sum(frequency) as maxrank
| eval median=if((maxrank-frequency <= medianrank) AND (medianrank < frequency), uploads, null())
| where isnotnull(median)
``````
Engager

If that's the case, I'm thinking of writing my own custom command to expand the frequency table... this way I can pipe the results to stats and do mean(..), median(..), p90(..), etc. Unless something like this already exists...

Get Updates on the Splunk Community!

#### There's No Place Like Chrome and the Splunk Platform

Watch On DemandMalware. Risky Extensions. Data Exfiltration. End-users are increasingly reliant on browsers to ...

#### The Great Resilience Quest: 5th Leaderboard Update

The fifth leaderboard update for The Great Resilience Quest is out &gt;&gt; &#x1f3c6; Check out the ...

#### Devesh Logendran, Splunk, and the Singapore Cyber Conquest

At this year’s Splunk University, I had the privilege of chatting with Devesh Logendran, one of the winners in ...