Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Splunk Search

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Community
- :
- Splunk Answers
- :
- Using Splunk
- :
- Splunk Search
- :
- Re: How to compute stats (mean, median, mode) over...

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark Topic
- Subscribe to Topic
- Mute Topic
- Printer Friendly Page

- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

jyzhang

Engager

06-28-2011
02:55 PM

I have a search which returns the result as frequency table:

uploads frequency

0 6

1 4

2 1

5 1

Basically, 6 users have uploaded 0 times, 4 users uploaded 1 time, and so on.

Now I want to compute stats such as the mean, median, and mode. Is there an easier way to compute stats directly on a frequency table such as this? (ie mean should be 0.916666, and mode is 0) This seems to be something that is required quite often, yet I haven't found a way to do it.

I understand there's way to calculate stats on the data before I turn it into a frequency table, the problem is that the query is rather complex (because in order to include the `upload == 0`

case, I had to do a lot of extra work). In other words, the underlying logs have no events when there are no uploads, but to calculate accurate statistics, the 0 events must be taken into consideration.

1 Solution

- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

gkanapathy

Splunk Employee

06-28-2011
06:39 PM

If you're frequency table is generated by you running `stats count`

, and then appending the other items, you can do it most efficiently with:

```
source=mysourcedata
| sistats mean(uploads) median(uploads) mode(uploads) by uploads
| append
[ stats count as psrsvd_sm_uploads
| eval psrsvd_ct_uploads= <number of entries with 0 uploads>
| eval psrsvd_gc=psrsvf_ct_uploads
| eval psrsvd_nc_uploads=psrsvd_ct_uploads
| eval psrsvd_rd_uploads="0e+00"+tostring(psrsvd_ct_uploads)+";"
| eval psrsvd_sm_uploads=0
| eval psrsvd_ss_uploads=0
| eval psrsvd_v=1
| eval psrsvd_vm_uploads="0;"+tostring(psrsvd_ct_uploads)+";"
| eval psrsvd_vt_uploads=0
| eval uploads=0
| stats mean(uploads) median(uploads) mode(uploads)
```

or, to get back the above frequency table:

```
source=mysourcedata
| sistats mean(uploads) median(uploads) mode(uploads) by uploads
| append
[ stats count as psrsvd_sm_uploads
| eval psrsvd_ct_uploads= <number of entries with 0 uploads>
| eval psrsvd_gc=psrsvf_ct_uploads
| eval psrsvd_nc_uploads=psrsvd_ct_uploads
| eval psrsvd_rd_uploads="0e+00"+tostring(psrsvd_ct_uploads)+";"
| eval psrsvd_sm_uploads=0
| eval psrsvd_ss_uploads=0
| eval psrsvd_v=1
| eval psrsvd_vm_uploads="0;"+tostring(psrsvd_ct_uploads)+";"
| eval psrsvd_vt_uploads=0
| eval uploads=0
| stats count as frequency by uploads
```

if you want, you can generate the subsearch into a file with outputcsv or outputlookup, then fetch it back in and append it.

- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

gkanapathy

Splunk Employee

06-28-2011
06:39 PM

If you're frequency table is generated by you running `stats count`

, and then appending the other items, you can do it most efficiently with:

```
source=mysourcedata
| sistats mean(uploads) median(uploads) mode(uploads) by uploads
| append
[ stats count as psrsvd_sm_uploads
| eval psrsvd_ct_uploads= <number of entries with 0 uploads>
| eval psrsvd_gc=psrsvf_ct_uploads
| eval psrsvd_nc_uploads=psrsvd_ct_uploads
| eval psrsvd_rd_uploads="0e+00"+tostring(psrsvd_ct_uploads)+";"
| eval psrsvd_sm_uploads=0
| eval psrsvd_ss_uploads=0
| eval psrsvd_v=1
| eval psrsvd_vm_uploads="0;"+tostring(psrsvd_ct_uploads)+";"
| eval psrsvd_vt_uploads=0
| eval uploads=0
| stats mean(uploads) median(uploads) mode(uploads)
```

or, to get back the above frequency table:

```
source=mysourcedata
| sistats mean(uploads) median(uploads) mode(uploads) by uploads
| append
[ stats count as psrsvd_sm_uploads
| eval psrsvd_ct_uploads= <number of entries with 0 uploads>
| eval psrsvd_gc=psrsvf_ct_uploads
| eval psrsvd_nc_uploads=psrsvd_ct_uploads
| eval psrsvd_rd_uploads="0e+00"+tostring(psrsvd_ct_uploads)+";"
| eval psrsvd_sm_uploads=0
| eval psrsvd_ss_uploads=0
| eval psrsvd_v=1
| eval psrsvd_vm_uploads="0;"+tostring(psrsvd_ct_uploads)+";"
| eval psrsvd_vt_uploads=0
| eval uploads=0
| stats count as frequency by uploads
```

if you want, you can generate the subsearch into a file with outputcsv or outputlookup, then fetch it back in and append it.

- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

gkanapathy

Splunk Employee

06-28-2011
06:47 PM

*that* hard, but left as an exercise.

- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

gkanapathy

Splunk Employee

06-28-2011
04:24 PM

No easy way to compute all of these in a single simple query.

mean:

```
... | stats
sum(uploads) as totaluploads
sum(frequency) as totalcount
| eval mean=totaluploads/totalcount
```

mode:

```
... | sort - frequency | head 1 | eval mode=uploads
```

median:

```
... | sort uploads
| eventstats sum(frequency) as totalcount | eval medianrank=floor((totalcount/2))
| streamstats global=f current=t window=0
sum(frequency) as maxrank
| where (maxrank-frequency <= medianrank) AND (medianrank < frequency)
| eval median = uploads
```

trickery to combine them:

```
... | sort - frequency
| eventstats first(uploads) as mode
| sort uploads
| eventstats
sum(frequency) as totalcount
sum(uploads) as totaluploads
| eval mean=totaluploads/totalcount
| eval medianrank=floor((totalcount/2))
| streamstats current=t window=0
sum(frequency) as maxrank
| eval median=if((maxrank-frequency <= medianrank) AND (medianrank < frequency), uploads, null())
| where isnotnull(median)
```

- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

jyzhang

Engager

06-28-2011
05:43 PM

Get Updates on the Splunk Community!

Watch On DemandMalware. Risky Extensions. Data Exfiltration. End-users are increasingly reliant on browsers to ...

The fifth leaderboard update for The Great Resilience Quest is out >>
🏆 Check out the ...

At this year’s Splunk University, I had the privilege of chatting with Devesh Logendran, one of the winners in ...