Splunk Search

Ignore empty data point when calculating average

ermosk
Engager

I am trying to calculate the average for a few columns and rows but I have came across the following issue. Some rows might not have data for a particular column, but it seems these are still taken into account when calculating the avg for this column which means the data is way off. I'd like my formula to ignore a data point (if not present) when calculating the average for the column.

Here is my formula :

table ABC* | stats avg(ABC_*)

So if I have something like :

Column : ABC_1 | ABC_2 | ABC_3 | ABC_4 | ABC_5
          1        2       1        2      2  
          2                2               1             
          1                2        3

The average for column ABC_2, ABC_4 and ABC_5 is incorrect.

Tags (2)
0 Karma

asturt
Explorer

Did you ever find a solution for this? I was doing what woodcock suggested in his reply to your question, but this rounds to a lower precision than the avg function.

0 Karma

stephanefotso
Motivator

I'm not sure that the problem is the avg function. Because the avg function returns the average of the values of field X. Means the null value of the field X is not taking into account by avg(X). In another words, in your table, avg(ABC_2) must return 2.
Check the problem elsewhere

0 Karma

ermosk
Engager

It's quite simple really. Looking at the table I can clearly see what the result should be. Just adding the avg function causes it go off.

0 Karma

srinathd
Contributor

Use fillnull to provide values for NULL and try.

0 Karma

ermosk
Engager

Using fillnull value=NULL seems to ignore any columns that have NULL in them.

0 Karma

woodcock
Esteemed Legend

Try this:

table ABC* | stats count(ABC*) AS countABC*, sum(ABC*) AS sumABC* | eval avg*=sumABC*/countABC*

ermosk
Engager

I like this approach but I can't get it to work. I get the following error:

Error in 'eval' command: The expression is malformed. An unexpected character is reached at '/count_*'.

Here is how my search currently looks

Some initial event filtering | table time_svc_agg* | stats count(time_svc_agg_) AS count_, sum(time_svc_agg_) AS sum_ | eval avg_=sum_/count_*

0 Karma

woodcock
Esteemed Legend

It looks like you will have to iterate each variable as you need it in the "eval" part like this:

| table time_svc_agg* | stats count(time_svc_agg_*) AS count_*, sum(time_svc_agg_*) AS sum_* | eval avg_1=sum_1/count_1 | eval avg_2=sum_2/count_2 ...
0 Karma

ermosk
Engager

So this introduces a new issue because I don't know beforehand how many of these time_svc_agg will exist, nor their names. Will I need a subquery to take care of this?

0 Karma

ermosk
Engager

I encapsulated them in $$ (like eval avg_=$sum_$/$count_*$ ) but I still cant see any new columns with avg_ in my table

0 Karma
.conf21 Now Fully Virtual!
Register for FREE Today!

We've made .conf21 totally virtual and totally FREE! Our completely online experience will run from 10/19 through 10/20 with some additional events, too!