I am trying to calculate the average for a few columns and rows but I have came across the following issue. Some rows might not have data for a particular column, but it seems these are still taken into account when calculating the avg for this column which means the data is way off. I'd like my formula to ignore a data point (if not present) when calculating the average for the column.
Here is my formula :
table ABC* | stats avg(ABC_*)
So if I have something like :
Column : ABC_1 | ABC_2 | ABC_3 | ABC_4 | ABC_5
1 2 1 2 2
2 2 1
1 2 3
The average for column ABC_2, ABC_4 and ABC_5 is incorrect.
Did you ever find a solution for this? I was doing what woodcock suggested in his reply to your question, but this rounds to a lower precision than the avg function.
I'm not sure that the problem is the avg function. Because the avg function returns the average of the values of field X. Means the null value of the field X is not taking into account by avg(X). In another words, in your table, avg(ABC_2) must return 2.
Check the problem elsewhere
It's quite simple really. Looking at the table I can clearly see what the result should be. Just adding the avg function causes it go off.
Use fillnull to provide values for NULL and try.
Using fillnull value=NULL seems to ignore any columns that have NULL in them.
Try this:
table ABC* | stats count(ABC*) AS countABC*, sum(ABC*) AS sumABC* | eval avg*=sumABC*/countABC*
I like this approach but I can't get it to work. I get the following error:
Error in 'eval' command: The expression is malformed. An unexpected character is reached at '/count_*'.
Here is how my search currently looks
Some initial event filtering | table time_svc_agg* | stats count(time_svc_agg_) AS count_, sum(time_svc_agg_) AS sum_ | eval avg_=sum_/count_*
It looks like you will have to iterate each variable as you need it in the "eval" part like this:
| table time_svc_agg* | stats count(time_svc_agg_*) AS count_*, sum(time_svc_agg_*) AS sum_* | eval avg_1=sum_1/count_1 | eval avg_2=sum_2/count_2 ...
So this introduces a new issue because I don't know beforehand how many of these time_svc_agg will exist, nor their names. Will I need a subquery to take care of this?
I encapsulated them in $$ (like eval avg_=$sum_$/$count_*$ ) but I still cant see any new columns with avg_ in my table