Splunk Search

## How to compute stats (mean, median, mode) over a frequency table? Engager

I have a search which returns the result as frequency table:

``````
0           6
1           4
2           1
5           1
``````

Basically, 6 users have uploaded 0 times, 4 users uploaded 1 time, and so on.

Now I want to compute stats such as the mean, median, and mode. Is there an easier way to compute stats directly on a frequency table such as this? (ie mean should be 0.916666, and mode is 0) This seems to be something that is required quite often, yet I haven't found a way to do it.

I understand there's way to calculate stats on the data before I turn it into a frequency table, the problem is that the query is rather complex (because in order to include the `upload == 0` case, I had to do a lot of extra work). In other words, the underlying logs have no events when there are no uploads, but to calculate accurate statistics, the 0 events must be taken into consideration.

Tags (5)
1 Solution  Splunk Employee

If you're frequency table is generated by you running `stats count`, and then appending the other items, you can do it most efficiently with:

``````source=mysourcedata
| append
| eval psrsvd_v=1
``````

or, to get back the above frequency table:

``````source=mysourcedata
| append
| eval psrsvd_v=1
| stats count as frequency by uploads
``````

if you want, you can generate the subsearch into a file with outputcsv or outputlookup, then fetch it back in and append it.  Splunk Employee

If you're frequency table is generated by you running `stats count`, and then appending the other items, you can do it most efficiently with:

``````source=mysourcedata
| append
| eval psrsvd_v=1
``````

or, to get back the above frequency table:

``````source=mysourcedata
| append
| eval psrsvd_v=1
| stats count as frequency by uploads
``````

if you want, you can generate the subsearch into a file with outputcsv or outputlookup, then fetch it back in and append it.  Splunk Employee

Note the above will let you calculate mean(), stdev(), count(), and any percentile, along with a few other functions. Adding max(), min(), first(), last(), etc aren't that hard, but left as an exercise.  Splunk Employee

No easy way to compute all of these in a single simple query.

mean:

``````... | stats
sum(frequency) as totalcount
``````

mode:

``````... | sort - frequency | head 1 | eval mode=uploads
``````

median:

``````... | sort uploads
| eventstats sum(frequency) as totalcount | eval medianrank=floor((totalcount/2))
| streamstats global=f current=t window=0
sum(frequency) as maxrank
| where (maxrank-frequency <= medianrank) AND (medianrank < frequency)
``````

trickery to combine them:

``````... | sort - frequency
| eventstats
sum(frequency) as totalcount
| eval medianrank=floor((totalcount/2))
| streamstats current=t window=0
sum(frequency) as maxrank
| eval median=if((maxrank-frequency <= medianrank) AND (medianrank < frequency), uploads, null())
| where isnotnull(median)
`````` Engager

If that's the case, I'm thinking of writing my own custom command to expand the frequency table... this way I can pipe the results to stats and do mean(..), median(..), p90(..), etc. Unless something like this already exists... Get Updates on the Splunk Community!

#### Splunk Security Content for Threat Detection & Response, Q1 Roundup

Join Principal Threat Researcher, Michael Haag, as he walks through:An introduction to the Splunk Threat ...

#### Splunk Life | Happy Pride Month!

Happy Pride Month, Splunk Community! &#x1f308; In the United States, as well as many countries around the ...

#### SplunkTrust | Where Are They Now - Michael Uschmann

The Background Five years ago, Splunk published several videos showcasing members of the SplunkTrust to share ...