Splunk Search

average(eventcount) applied to transactions returns the wrong value sometimes

fere
Path Finder

I am comparing the results of the following two searches for one user id:

source="xxxx" | transaction user_id, pid keeporphans=f maxspan=70m maxpause=45m mvraw=t delim="," mvlist=t | stats avg(eventcount) avg(duration) by user_id

which returns the following for this user id: (the same for mean(eventcount)

     user_id                    avg(eventcoun     avg(duration)

4f7b35d0d93d056a5c000028 6.000000 2297.694808

And:

source="xxxx" | transaction user_id, pid keeporphans=f maxspan=70m maxpause=45m mvraw=t delim="," mvlist=t | search user_id="4f7b35d0d93d056a5c000028"

which displays the following info when I click on the eventcount field in the left column:

Min: 2 Max: 8 Mean: 4 Stdev: 3.098

Values # %

2 4 66.667%

8 2 33.333%

Based on the above data, the average for this user_id should be calculated to 4, not 6 which is returned by the first search query. avg(duration) has the same issue and is calulated too high by the first search query.
Any ideas what is going on here? how to fix this?

0 Karma
1 Solution

gkanapathy
Splunk Employee
Splunk Employee

It does not return the wrong value. In each case, you are computing different averages (and stdevs, etc).

Because you specified mvlist=t in transaction, user_id was created as multi-valued field. The stats command operates on multi-valued group-by fields by treating them as if each value represented a separate event. However, eventcount only appears once in the data, and the "interesting fields" only displays its count and average of the entire number of resulting complete transactions. So in the first case, you have (probably) four transactions with two lines each (and two occurrences of user_id), and two transactions with eight lines each (and eight occurrences of the user_id). So, your average would be computed as (8x(8x2) + 2x(2x4))/(8x2 + 2x4) = 6. In the second case, you simply have 4 occurrences of 2, and 2 occurrences of 8, so the average is (2x4 + 8x2)/(4+2 = 4).

It is quite easy to see if you add count(eventcount) to your results. In that case, the stats command will return 24 items, while the "Interesting Fields" will show 6 transactions/events.

View solution in original post

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

It does not return the wrong value. In each case, you are computing different averages (and stdevs, etc).

Because you specified mvlist=t in transaction, user_id was created as multi-valued field. The stats command operates on multi-valued group-by fields by treating them as if each value represented a separate event. However, eventcount only appears once in the data, and the "interesting fields" only displays its count and average of the entire number of resulting complete transactions. So in the first case, you have (probably) four transactions with two lines each (and two occurrences of user_id), and two transactions with eight lines each (and eight occurrences of the user_id). So, your average would be computed as (8x(8x2) + 2x(2x4))/(8x2 + 2x4) = 6. In the second case, you simply have 4 occurrences of 2, and 2 occurrences of 8, so the average is (2x4 + 8x2)/(4+2 = 4).

It is quite easy to see if you add count(eventcount) to your results. In that case, the stats command will return 24 items, while the "Interesting Fields" will show 6 transactions/events.

0 Karma
Get Updates on the Splunk Community!

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

A few months ago, we released Splunk Enterprise Security 8.0 for our cloud customers. Today, we are excited to ...

Logs to Metrics

Logs and Metrics Logs are generally unstructured text or structured events emitted by applications and written ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...