Hi, so I currently have a column chart that has two bars for each day of the week, one bar is reanalysis and one is resubmission. I want to add a third column for each day that does an average across both items but I can't find a good way to add another field entry.
I can shape the data to get the Sum/count separately but I'm not sure how to add a total
Analysis_type Date Sum Count
reanalysis 06/12 50 2
resubmission 06/12 400 4
reanalysis 06/13 55 5
resubmission 06/13 500 10
Should look like this in final iteration
Analysis_type Date Average
reanalysis 06/12 25
resubmission 06/12 100
Total 06/12 75 ((50 + 400)/(2+4))
reanalysis 06/13 11
resubmission 06/13 50
Total 06/13 37 ((55+500)/(10+5)
Let me know if more information is needed to handle this issue.
Here is the query I am using to generate the first chart.
index=cumulus_test1 AND (analysis_type="reanalysis" OR analysis_type="resubmission")
| dedup file_name
| eval SizeGB = ubf_size/1000000000
| eval month_num=strftime(_time,"%m")
| eval day_num=strftime(_time,"%d")
| eval date=month_num."/".day_num
| stats sum(SizeGB) AS Sum, count(file_name) AS count by analysis_type, date
Try like this
** (Updated per comment)**
index=cumulus_test1 AND (analysis_type="reanalysis" OR analysis_type="resubmission")
| dedup file_name
| eval SizeGB = ubf_size/1000000000
| eval month_num=strftime(_time,"%m")
| eval day_num=strftime(_time,"%d")
| eval date=month_num."/".day_num
| stats sum(SizeGB) AS Sum, count(file_name) AS count by analysis_type, date
| appendpipe [| stats sum(Sum) as Sum sum(count) count by date | eval analysis_type="Total" ]
| eval Average=Sum/count | fields - Sum count
| sort 0 analysis_type date
Try like this
** (Updated per comment)**
index=cumulus_test1 AND (analysis_type="reanalysis" OR analysis_type="resubmission")
| dedup file_name
| eval SizeGB = ubf_size/1000000000
| eval month_num=strftime(_time,"%m")
| eval day_num=strftime(_time,"%d")
| eval date=month_num."/".day_num
| stats sum(SizeGB) AS Sum, count(file_name) AS count by analysis_type, date
| appendpipe [| stats sum(Sum) as Sum sum(count) count by date | eval analysis_type="Total" ]
| eval Average=Sum/count | fields - Sum count
| sort 0 analysis_type date
Thanks! That worked perfectly!
Without reading too much, have you tried putting
| addcoltotals
At the end?
I attempted that however that gives me the totals across the entire date range when I want the totals to be done by date.
Just add this to the end of your existing search:
| eval Average = Sum / Count | fields - Sum Count
That gives me the average for both reanalysis and resubmission but I'm trying to get an aggregated average across both. I updated the post with a bit more detail
There's no need to be so snarky considering this is a place where people ask for help...
Here's the event flow:
I have single events that have a ubf_size and one of two different analysis types. I want an average by analysis types and I also want an aggregated average across both analysis types for each day. I'm not sure what's unclear about that. I simplified it so you can see exactly what I mean, right now I have the first two rows but I want to produce a third total row that sums up the Sum(ubf_size) and count(files) from reanalysis and resubmission in order to produce an aggregated average while retaining the data regarding reanalysis and resubmission.
The data looks like this
Analysis Type Date Sum(ubf_size) count(files) Average
reanalysis 06/12 10 5 2
resubmission 06/12 12 3 4
I want to add a row like this
total 06/12 22 8 2.75
The answer you gave me gives me an average for both reanalysis and resubmission but there is no "total"
Can you share your query?
It should look something like ... | stats values(Date), avg(Average) by Analysis_type
Updated the post with my query
how did you calculate the averages on the second results?
maybe `... | eval Average = round(Sum/Count, 2)
hope it helps
My issue isn't calculating the average but taking an aggregated average of reanalysis and resubmission values and generating another entry for each date under a Total
can you please elaborate a little more?
sorry i cant understand exactly what you are looking for but in your example, the math does not work considering average = sum / count, your results do not reflect that.
maybe you refer to the | addcoltotal command?
it is tricky when adding averages, or do you want to add averages?
please share some more info so we can better assist
Sorry, I was unclear and used filler numbers to get the point across. I've fixed it so the numbers all make sense
Each day I produce files with a ubf_size. These files fall into two categories reanalysis and resubmission. I want an average of the ubf_sizes for each category on a per day basis. I also want an aggregated average that combines the two categories.
Taking the example from 06/12 above
Reanalysis had 2 files for a total ubf_size of 50 GB giving an average of 25
Resubmission had 4 files for a total ubf_size of 400 GB giving an average of 100
I have no trouble getting the above two rows but I want to somehow append another row that would look like
Total 6 files for a total ubf_size of 450 giving an average of 75.
While retaining the data for reanalysis and resubmission and doing it on a per day basis.
| addcoltotal doesn't work because it does it across the entire dataset when I want the totals per day. I don't want to add averages because that number would be incorrect which is why I calculated the Sum/Count separately to manually calculate the averages instead of using stats avg(ubf_size). I'm simply unsure how to manipulate the Sum/Count to get a separate total.
add to your search:
sum(ubf_size) as total_gb count as total_count
| eval avg = total_gb / total_count
Yes, that would successfully give me the total I am looking for but then I lose the statistics regarding reanalysis/resubmission. My final goal is a column chart where the x-axis is the date, y-axis is the average and each date has three columns (reanalysis, resubmission, total).
This question is so borked that maybe you should start over. I have NO IDEA what you are trying to do at this point. The path from your events to your "final iteration" is infinitely cloudy. I have no ideal what math/logic you are trying to apply.
See my response to Adonio. I fixed up the numbers in the example to make sense.
As it is currently, I can generate the averages for both reanalysis and resubmission just fine. If that was it I could use stats avg(ubf_size) to achieve that.
The issue is that I want an aggregated average which combines the totals of both on a per day basis while retaining the reanalysis and resubmission statistics. To me the best method seems to be calculating the Sum/Count separately then somehow appending the summation on a per day basis to a new analysis_type called "Total" where the
average=Sum(reanalysis+resubmission ubf_size)/Count(reanalysis+resubmission file count).