Splunk Search

How to add total median and min to median and min by month?

qqzj
Explorer

Hey guys!

 

I need the statistics of a bunch of data by month. And this is done already.

 

search
|eval Month=strftime(_time,"%Y %m") | stats count(mydata) AS nobs, mean(mydata) as mean, min(mydata) as min by Month | reverse

 

The output is what I want:

Monthnobsmeanmin
2023 061900-5.0239778-68.73417
2023 053562-4.2430259-67.134697
2023 043181-4.1811658-64.995394
2023 034274-4.3373071-134.20177
2023 023939-4.7725011-73.538274
2023 012868-5.5231115-41.056093
2022 12395-4.617424-35.51642

 

Now I want to add another row at the bottom, called 'All', like this

Monthnobsmeanmin
2023 061900-5.0239778-68.73417
2023 053562-4.2430259-67.134697
2023 043181-4.1811658-64.995394
2023 034274-4.3373071-134.20177
2023 023939-4.7725011-73.538274
2023 012868-5.5231115-41.056093
2022 12395-4.617424-35.51642
Allz.zzx.xxy.yy

 

Here, the result for x.xx, y.yy and z.zz should be from the whole search result, instead of from the statistics shown in the rows. I am wondering how to do that? Ideally, I can use the search result without the need to search again and degrade performance. Thanks!

 

Labels (1)
0 Karma
1 Solution

danspav
SplunkTrust
SplunkTrust

Hi @qqzj,

Here's a way of getting two sets of different stats by using the appendpipe command:

| gentimes start=-217
| eval _time=starttime, mydata=-1*(random()%1000/100)
``` Create random test data ```

``` Calculate the stats for the whole data set ```
| appendpipe[|eval Month=strftime(_time,"%Y %m")  | stats count(mydata) AS nobs, mean(mydata) as mean, min(mydata) as min| eval Month="All"]

``` Calculate the stats for each month ```
| appendpipe [|eval Month=strftime(_time,"%Y %m") | stats count(mydata) AS nobs, mean(mydata) as mean, min(mydata) as min by Month]

``` Filter out the stats we just made, sorted reverse chronological order```
| search Month=*
| table Month, nobs, mean, min
| eval sort=if(Month="All", 0, Month)
| sort 0 - sort | fields - sort

 

It takes the output of the search so far, and runs the stats command. The resulting fields are added to the search result set.

Next, we run stats on the output of the search again, but this time we calculate the monthly values.

Finally, we throw away the source data and just select the stats we created.

That gives you a table that looks like this:

danspav_0-1687502334469.png


Cheers,
Daniel

View solution in original post

yuanliu
SplunkTrust
SplunkTrust

First look at the mathematics.  Total nobs is just a sum.  All time min is just minimum of all monthly minimums.  To calculate mean, you just sum up mean*nobs, then divide by total nobs.  In SPL, that is

search
| eval Month=strftime(_time,"%Y %m")
| stats count(mydata) AS nobs, mean(mydata) as mean, min(mydata) as min by Month
| reverse
| appendpipe
    [ stats sum(nobs) as nobs min(min) as min sum(eval(nobs * mean)) as mean
    | eval mean = mean / nobs ]

 

Tags (1)

qqzj
Explorer

Thanks for the help! I guess it is my fault that I did not include 'median' in the column in the example, even though I included median in the title of the question. Your method works for mean, min, and max. But not for median. I also need to deal with percentiles actually. So I need a way to direct use the result of the search, instead of use the results being processed once. Thanks!

0 Karma

danspav
SplunkTrust
SplunkTrust

Hi @qqzj,

Here's a way of getting two sets of different stats by using the appendpipe command:

| gentimes start=-217
| eval _time=starttime, mydata=-1*(random()%1000/100)
``` Create random test data ```

``` Calculate the stats for the whole data set ```
| appendpipe[|eval Month=strftime(_time,"%Y %m")  | stats count(mydata) AS nobs, mean(mydata) as mean, min(mydata) as min| eval Month="All"]

``` Calculate the stats for each month ```
| appendpipe [|eval Month=strftime(_time,"%Y %m") | stats count(mydata) AS nobs, mean(mydata) as mean, min(mydata) as min by Month]

``` Filter out the stats we just made, sorted reverse chronological order```
| search Month=*
| table Month, nobs, mean, min
| eval sort=if(Month="All", 0, Month)
| sort 0 - sort | fields - sort

 

It takes the output of the search so far, and runs the stats command. The resulting fields are added to the search result set.

Next, we run stats on the output of the search again, but this time we calculate the monthly values.

Finally, we throw away the source data and just select the stats we created.

That gives you a table that looks like this:

danspav_0-1687502334469.png


Cheers,
Daniel

qqzj
Explorer

Hi danspav, I am very new to Splunk. So may I ask for a few extra helps?

1 What is the meaning of ' search Month=*'? I deleted it and the code still works. So I am not sure whether I should keep it or not.

2 What is the meaning of 

sort 0 - sort

 Thanks!

0 Karma

qqzj
Explorer

sort 0 -sort. I think I figured this one out. sort 0 means including all records. -sort means using the descending order of the new variable created. Cool.

0 Karma

danspav
SplunkTrust
SplunkTrust

Hi,

Yes, sort 0 - sort will reverse sort by the "sort" field. The zero means sort all fields. 

This bit:

| search Month=*


Is used to filter out unneeded data. In our results set we have the raw data from the index and our calculated stats.  We only want the stats, so we search for all rows that have a field called Month that is not null.
If your search results look ok without it, then you can safely remove this bit.


danspav
SplunkTrust
SplunkTrust

Actually, you could get simpler than that still.

| gentimes start=-217
| eval _time=starttime, mydata=-1*(random()%1000/100)
| eval Month=strftime(_time,"%Y %m") | stats count(mydata) AS nobs, mean(mydata) as mean, min(mydata) as min by Month | reverse
``` Create random test data ```

``` Work out Max, min, mean from the data ```
| eval means = nobs * mean
| appendpipe[|stats sum(nobs) as nobs, sum(means) as mean, min(min) as min| eval mean=mean/nobs, Month="All" ]
| fields - means

Comparing the two methods, this one is accurate to twelve or so decimal places.

qqzj
Explorer

Hi danspav, THANK YOU!!!

I forgot to include 'median' as a column in the example, so your second, shorter, method does not work well for my purpose. But your first method works like magic!

0 Karma
Get Updates on the Splunk Community!

3 Ways to Make OpenTelemetry Even Better

My role as an Observability Specialist at Splunk provides me with the opportunity to work with customers of ...

What's New in Splunk Cloud Platform 9.2.2406?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.2.2406 with many ...

Enterprise Security Content Update (ESCU) | New Releases

In August, the Splunk Threat Research Team had 3 releases of new security content via the Enterprise Security ...