I am comparing the results of the following two searches for one user id:
source="xxxx" | transaction user_id, pid keeporphans=f maxspan=70m maxpause=45m mvraw=t delim="," mvlist=t | stats avg(eventcount) avg(duration) by user_id
which returns the following for this user id: (the same for mean(eventcount)
user_id avg(eventcoun avg(duration)
4f7b35d0d93d056a5c000028 6.000000 2297.694808
And:
source="xxxx" | transaction user_id, pid keeporphans=f maxspan=70m maxpause=45m mvraw=t delim="," mvlist=t | search user_id="4f7b35d0d93d056a5c000028"
which displays the following info when I click on the eventcount field in the left column:
Min: 2 Max: 8 Mean: 4 Stdev: 3.098
Values # %
2 4 66.667%
8 2 33.333%
Based on the above data, the average for this user_id should be calculated to 4, not 6 which is returned by the first search query. avg(duration) has the same issue and is calulated too high by the first search query.
Any ideas what is going on here? how to fix this?
It does not return the wrong value. In each case, you are computing different averages (and stdevs, etc).
Because you specified mvlist=t
in transaction
, user_id
was created as multi-valued field. The stats
command operates on multi-valued group-by fields by treating them as if each value represented a separate event. However, eventcount only appears once in the data, and the "interesting fields" only displays its count and average of the entire number of resulting complete transactions. So in the first case, you have (probably) four transactions with two lines each (and two occurrences of user_id
), and two transactions with eight lines each (and eight occurrences of the user_id
). So, your average would be computed as (8x(8x2) + 2x(2x4))/(8x2 + 2x4) = 6. In the second case, you simply have 4 occurrences of 2, and 2 occurrences of 8, so the average is (2x4 + 8x2)/(4+2 = 4).
It is quite easy to see if you add count(eventcount)
to your results. In that case, the stats
command will return 24 items, while the "Interesting Fields" will show 6 transactions/events.
It does not return the wrong value. In each case, you are computing different averages (and stdevs, etc).
Because you specified mvlist=t
in transaction
, user_id
was created as multi-valued field. The stats
command operates on multi-valued group-by fields by treating them as if each value represented a separate event. However, eventcount only appears once in the data, and the "interesting fields" only displays its count and average of the entire number of resulting complete transactions. So in the first case, you have (probably) four transactions with two lines each (and two occurrences of user_id
), and two transactions with eight lines each (and eight occurrences of the user_id
). So, your average would be computed as (8x(8x2) + 2x(2x4))/(8x2 + 2x4) = 6. In the second case, you simply have 4 occurrences of 2, and 2 occurrences of 8, so the average is (2x4 + 8x2)/(4+2 = 4).
It is quite easy to see if you add count(eventcount)
to your results. In that case, the stats
command will return 24 items, while the "Interesting Fields" will show 6 transactions/events.