Splunk Search

average(eventcount) applied to transactions returns the wrong value sometimes

Path Finder

I am comparing the results of the following two searches for one user id:

source="xxxx" | transaction userid, pid keeporphans=f maxspan=70m maxpause=45m mvraw=t delim="," mvlist=t | stats avg(eventcount) avg(duration) by userid

which returns the following for this user id: (the same for mean(eventcount)

     user_id                    avg(eventcoun     avg(duration)

4f7b35d0d93d056a5c000028 6.000000 2297.694808

And:

source="xxxx" | transaction userid, pid keeporphans=f maxspan=70m maxpause=45m mvraw=t delim="," mvlist=t | search userid="4f7b35d0d93d056a5c000028"

which displays the following info when I click on the eventcount field in the left column:

Min: 2 Max: 8 Mean: 4 Stdev: 3.098

Values # %

2 4 66.667%

8 2 33.333%

Based on the above data, the average for this user_id should be calculated to 4, not 6 which is returned by the first search query. avg(duration) has the same issue and is calulated too high by the first search query.
Any ideas what is going on here? how to fix this?

0 Karma
1 Solution

Splunk Employee
Splunk Employee

It does not return the wrong value. In each case, you are computing different averages (and stdevs, etc).

Because you specified mvlist=t in transaction, user_id was created as multi-valued field. The stats command operates on multi-valued group-by fields by treating them as if each value represented a separate event. However, eventcount only appears once in the data, and the "interesting fields" only displays its count and average of the entire number of resulting complete transactions. So in the first case, you have (probably) four transactions with two lines each (and two occurrences of user_id), and two transactions with eight lines each (and eight occurrences of the user_id). So, your average would be computed as (8x(8x2) + 2x(2x4))/(8x2 + 2x4) = 6. In the second case, you simply have 4 occurrences of 2, and 2 occurrences of 8, so the average is (2x4 + 8x2)/(4+2 = 4).

It is quite easy to see if you add count(eventcount) to your results. In that case, the stats command will return 24 items, while the "Interesting Fields" will show 6 transactions/events.

View solution in original post

0 Karma

Splunk Employee
Splunk Employee

It does not return the wrong value. In each case, you are computing different averages (and stdevs, etc).

Because you specified mvlist=t in transaction, user_id was created as multi-valued field. The stats command operates on multi-valued group-by fields by treating them as if each value represented a separate event. However, eventcount only appears once in the data, and the "interesting fields" only displays its count and average of the entire number of resulting complete transactions. So in the first case, you have (probably) four transactions with two lines each (and two occurrences of user_id), and two transactions with eight lines each (and eight occurrences of the user_id). So, your average would be computed as (8x(8x2) + 2x(2x4))/(8x2 + 2x4) = 6. In the second case, you simply have 4 occurrences of 2, and 2 occurrences of 8, so the average is (2x4 + 8x2)/(4+2 = 4).

It is quite easy to see if you add count(eventcount) to your results. In that case, the stats command will return 24 items, while the "Interesting Fields" will show 6 transactions/events.

View solution in original post

0 Karma