Hey guys!
I need the statistics of a bunch of data by month. And this is done already.
search
|eval Month=strftime(_time,"%Y %m") | stats count(mydata) AS nobs, mean(mydata) as mean, min(mydata) as min by Month | reverse
The output is what I want:
Month | nobs | mean | min |
2023 06 | 1900 | -5.0239778 | -68.73417 |
2023 05 | 3562 | -4.2430259 | -67.134697 |
2023 04 | 3181 | -4.1811658 | -64.995394 |
2023 03 | 4274 | -4.3373071 | -134.20177 |
2023 02 | 3939 | -4.7725011 | -73.538274 |
2023 01 | 2868 | -5.5231115 | -41.056093 |
2022 12 | 395 | -4.617424 | -35.51642 |
Now I want to add another row at the bottom, called 'All', like this
Month | nobs | mean | min |
2023 06 | 1900 | -5.0239778 | -68.73417 |
2023 05 | 3562 | -4.2430259 | -67.134697 |
2023 04 | 3181 | -4.1811658 | -64.995394 |
2023 03 | 4274 | -4.3373071 | -134.20177 |
2023 02 | 3939 | -4.7725011 | -73.538274 |
2023 01 | 2868 | -5.5231115 | -41.056093 |
2022 12 | 395 | -4.617424 | -35.51642 |
All | z.zz | x.xx | y.yy |
Here, the result for x.xx, y.yy and z.zz should be from the whole search result, instead of from the statistics shown in the rows. I am wondering how to do that? Ideally, I can use the search result without the need to search again and degrade performance. Thanks!
Hi @qqzj,
Here's a way of getting two sets of different stats by using the appendpipe command:
| gentimes start=-217
| eval _time=starttime, mydata=-1*(random()%1000/100)
``` Create random test data ```
``` Calculate the stats for the whole data set ```
| appendpipe[|eval Month=strftime(_time,"%Y %m") | stats count(mydata) AS nobs, mean(mydata) as mean, min(mydata) as min| eval Month="All"]
``` Calculate the stats for each month ```
| appendpipe [|eval Month=strftime(_time,"%Y %m") | stats count(mydata) AS nobs, mean(mydata) as mean, min(mydata) as min by Month]
``` Filter out the stats we just made, sorted reverse chronological order```
| search Month=*
| table Month, nobs, mean, min
| eval sort=if(Month="All", 0, Month)
| sort 0 - sort | fields - sort
It takes the output of the search so far, and runs the stats command. The resulting fields are added to the search result set.
Next, we run stats on the output of the search again, but this time we calculate the monthly values.
Finally, we throw away the source data and just select the stats we created.
That gives you a table that looks like this:
Cheers,
Daniel
First look at the mathematics. Total nobs is just a sum. All time min is just minimum of all monthly minimums. To calculate mean, you just sum up mean*nobs, then divide by total nobs. In SPL, that is
search
| eval Month=strftime(_time,"%Y %m")
| stats count(mydata) AS nobs, mean(mydata) as mean, min(mydata) as min by Month
| reverse
| appendpipe
[ stats sum(nobs) as nobs min(min) as min sum(eval(nobs * mean)) as mean
| eval mean = mean / nobs ]
Thanks for the help! I guess it is my fault that I did not include 'median' in the column in the example, even though I included median in the title of the question. Your method works for mean, min, and max. But not for median. I also need to deal with percentiles actually. So I need a way to direct use the result of the search, instead of use the results being processed once. Thanks!
Hi @qqzj,
Here's a way of getting two sets of different stats by using the appendpipe command:
| gentimes start=-217
| eval _time=starttime, mydata=-1*(random()%1000/100)
``` Create random test data ```
``` Calculate the stats for the whole data set ```
| appendpipe[|eval Month=strftime(_time,"%Y %m") | stats count(mydata) AS nobs, mean(mydata) as mean, min(mydata) as min| eval Month="All"]
``` Calculate the stats for each month ```
| appendpipe [|eval Month=strftime(_time,"%Y %m") | stats count(mydata) AS nobs, mean(mydata) as mean, min(mydata) as min by Month]
``` Filter out the stats we just made, sorted reverse chronological order```
| search Month=*
| table Month, nobs, mean, min
| eval sort=if(Month="All", 0, Month)
| sort 0 - sort | fields - sort
It takes the output of the search so far, and runs the stats command. The resulting fields are added to the search result set.
Next, we run stats on the output of the search again, but this time we calculate the monthly values.
Finally, we throw away the source data and just select the stats we created.
That gives you a table that looks like this:
Cheers,
Daniel
Hi danspav, I am very new to Splunk. So may I ask for a few extra helps?
1 What is the meaning of ' search Month=*'? I deleted it and the code still works. So I am not sure whether I should keep it or not.
2 What is the meaning of
sort 0 - sort
Thanks!
sort 0 -sort. I think I figured this one out. sort 0 means including all records. -sort means using the descending order of the new variable created. Cool.
Hi,
Yes, sort 0 - sort will reverse sort by the "sort" field. The zero means sort all fields.
This bit:
| search Month=*
Is used to filter out unneeded data. In our results set we have the raw data from the index and our calculated stats. We only want the stats, so we search for all rows that have a field called Month that is not null.
If your search results look ok without it, then you can safely remove this bit.
Actually, you could get simpler than that still.
| gentimes start=-217
| eval _time=starttime, mydata=-1*(random()%1000/100)
| eval Month=strftime(_time,"%Y %m") | stats count(mydata) AS nobs, mean(mydata) as mean, min(mydata) as min by Month | reverse
``` Create random test data ```
``` Work out Max, min, mean from the data ```
| eval means = nobs * mean
| appendpipe[|stats sum(nobs) as nobs, sum(means) as mean, min(min) as min| eval mean=mean/nobs, Month="All" ]
| fields - means
Comparing the two methods, this one is accurate to twelve or so decimal places.
Hi danspav, THANK YOU!!!
I forgot to include 'median' as a column in the example, so your second, shorter, method does not work well for my purpose. But your first method works like magic!