Solved: Re: How to add total median and min to median and ...

qqzj · ‎06-21-2023

Hey guys!

I need the statistics of a bunch of data by month. And this is done already.

search
|eval Month=strftime(_time,"%Y %m") | stats count(mydata) AS nobs, mean(mydata) as mean, min(mydata) as min by Month | reverse

The output is what I want:

Month	nobs	mean	min
2023 06	1900	-5.0239778	-68.73417
2023 05	3562	-4.2430259	-67.134697
2023 04	3181	-4.1811658	-64.995394
2023 03	4274	-4.3373071	-134.20177
2023 02	3939	-4.7725011	-73.538274
2023 01	2868	-5.5231115	-41.056093
2022 12	395	-4.617424	-35.51642

Now I want to add another row at the bottom, called 'All', like this

Month	nobs	mean	min
2023 06	1900	-5.0239778	-68.73417
2023 05	3562	-4.2430259	-67.134697
2023 04	3181	-4.1811658	-64.995394
2023 03	4274	-4.3373071	-134.20177
2023 02	3939	-4.7725011	-73.538274
2023 01	2868	-5.5231115	-41.056093
2022 12	395	-4.617424	-35.51642
All	z.zz	x.xx	y.yy

Here, the result for x.xx, y.yy and z.zz should be from the whole search result, instead of from the statistics shown in the rows. I am wondering how to do that? Ideally, I can use the search result without the need to search again and degrade performance. Thanks!

danspav · ‎06-22-2023

Hi @qqzj,

Here's a way of getting two sets of different stats by using the appendpipe command:

| gentimes start=-217
| eval _time=starttime, mydata=-1*(random()%1000/100)
``` Create random test data ```

``` Calculate the stats for the whole data set ```
| appendpipe[|eval Month=strftime(_time,"%Y %m")  | stats count(mydata) AS nobs, mean(mydata) as mean, min(mydata) as min| eval Month="All"]

``` Calculate the stats for each month ```
| appendpipe [|eval Month=strftime(_time,"%Y %m") | stats count(mydata) AS nobs, mean(mydata) as mean, min(mydata) as min by Month]

``` Filter out the stats we just made, sorted reverse chronological order```
| search Month=*
| table Month, nobs, mean, min
| eval sort=if(Month="All", 0, Month)
| sort 0 - sort | fields - sort

It takes the output of the search so far, and runs the stats command. The resulting fields are added to the search result set.

Next, we run stats on the output of the search again, but this time we calculate the monthly values.

Finally, we throw away the source data and just select the stats we created.

That gives you a table that looks like this:

Cheers,
Daniel

View solution in original post

yuanliu · ‎06-22-2023

First look at the mathematics. Total nobs is just a sum. All time min is just minimum of all monthly minimums. To calculate mean, you just sum up mean*nobs, then divide by total nobs. In SPL, that is

search
| eval Month=strftime(_time,"%Y %m")
| stats count(mydata) AS nobs, mean(mydata) as mean, min(mydata) as min by Month
| reverse
| appendpipe
    [ stats sum(nobs) as nobs min(min) as min sum(eval(nobs * mean)) as mean
    | eval mean = mean / nobs ]

qqzj · ‎06-26-2023

Thanks for the help! I guess it is my fault that I did not include 'median' in the column in the example, even though I included median in the title of the question. Your method works for mean, min, and max. But not for median. I also need to deal with percentiles actually. So I need a way to direct use the result of the search, instead of use the results being processed once. Thanks!

danspav · ‎06-22-2023

Hi @qqzj,

Here's a way of getting two sets of different stats by using the appendpipe command:

| gentimes start=-217
| eval _time=starttime, mydata=-1*(random()%1000/100)
``` Create random test data ```

``` Calculate the stats for the whole data set ```
| appendpipe[|eval Month=strftime(_time,"%Y %m")  | stats count(mydata) AS nobs, mean(mydata) as mean, min(mydata) as min| eval Month="All"]

``` Calculate the stats for each month ```
| appendpipe [|eval Month=strftime(_time,"%Y %m") | stats count(mydata) AS nobs, mean(mydata) as mean, min(mydata) as min by Month]

``` Filter out the stats we just made, sorted reverse chronological order```
| search Month=*
| table Month, nobs, mean, min
| eval sort=if(Month="All", 0, Month)
| sort 0 - sort | fields - sort

It takes the output of the search so far, and runs the stats command. The resulting fields are added to the search result set.

Next, we run stats on the output of the search again, but this time we calculate the monthly values.

Finally, we throw away the source data and just select the stats we created.

That gives you a table that looks like this:

Cheers,
Daniel

qqzj · ‎06-26-2023

Hi danspav, I am very new to Splunk. So may I ask for a few extra helps?

1 What is the meaning of ' search Month=*'? I deleted it and the code still works. So I am not sure whether I should keep it or not.

2 What is the meaning of

sort 0 - sort

Thanks!

qqzj · ‎06-26-2023

sort 0 -sort. I think I figured this one out. sort 0 means including all records. -sort means using the descending order of the new variable created. Cool.

danspav · ‎06-26-2023

Hi,

Yes, sort 0 - sort will reverse sort by the "sort" field. The zero means sort all fields.

This bit:

| search Month=*

Is used to filter out unneeded data. In our results set we have the raw data from the index and our calculated stats. We only want the stats, so we search for all rows that have a field called Month that is not null.
If your search results look ok without it, then you can safely remove this bit.

danspav · ‎06-22-2023

Actually, you could get simpler than that still.

| gentimes start=-217
| eval _time=starttime, mydata=-1*(random()%1000/100)
| eval Month=strftime(_time,"%Y %m") | stats count(mydata) AS nobs, mean(mydata) as mean, min(mydata) as min by Month | reverse
``` Create random test data ```

``` Work out Max, min, mean from the data ```
| eval means = nobs * mean
| appendpipe[|stats sum(nobs) as nobs, sum(means) as mean, min(min) as min| eval mean=mean/nobs, Month="All" ]
| fields - means

Comparing the two methods, this one is accurate to twelve or so decimal places.

qqzj · ‎06-26-2023

Hi danspav, THANK YOU!!!

I forgot to include 'median' as a column in the example, so your second, shorter, method does not work well for my purpose. But your first method works like magic!

How to add total median and min to median and min by month?

stats

Enterprise Security Content Update (ESCU) | New Releases

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

Index This | What are the 12 Days of Splunk-mas?