Splunk Search

average(eventcount) applied to transactions returns the wrong value sometimes

fere
Path Finder

I am comparing the results of the following two searches for one user id:

source="xxxx" | transaction user_id, pid keeporphans=f maxspan=70m maxpause=45m mvraw=t delim="," mvlist=t | stats avg(eventcount) avg(duration) by user_id

which returns the following for this user id: (the same for mean(eventcount)

     user_id                    avg(eventcoun     avg(duration)

4f7b35d0d93d056a5c000028 6.000000 2297.694808

And:

source="xxxx" | transaction user_id, pid keeporphans=f maxspan=70m maxpause=45m mvraw=t delim="," mvlist=t | search user_id="4f7b35d0d93d056a5c000028"

which displays the following info when I click on the eventcount field in the left column:

Min: 2 Max: 8 Mean: 4 Stdev: 3.098

Values # %

2 4 66.667%

8 2 33.333%

Based on the above data, the average for this user_id should be calculated to 4, not 6 which is returned by the first search query. avg(duration) has the same issue and is calulated too high by the first search query.
Any ideas what is going on here? how to fix this?

0 Karma
1 Solution

gkanapathy
Splunk Employee
Splunk Employee

It does not return the wrong value. In each case, you are computing different averages (and stdevs, etc).

Because you specified mvlist=t in transaction, user_id was created as multi-valued field. The stats command operates on multi-valued group-by fields by treating them as if each value represented a separate event. However, eventcount only appears once in the data, and the "interesting fields" only displays its count and average of the entire number of resulting complete transactions. So in the first case, you have (probably) four transactions with two lines each (and two occurrences of user_id), and two transactions with eight lines each (and eight occurrences of the user_id). So, your average would be computed as (8x(8x2) + 2x(2x4))/(8x2 + 2x4) = 6. In the second case, you simply have 4 occurrences of 2, and 2 occurrences of 8, so the average is (2x4 + 8x2)/(4+2 = 4).

It is quite easy to see if you add count(eventcount) to your results. In that case, the stats command will return 24 items, while the "Interesting Fields" will show 6 transactions/events.

View solution in original post

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

It does not return the wrong value. In each case, you are computing different averages (and stdevs, etc).

Because you specified mvlist=t in transaction, user_id was created as multi-valued field. The stats command operates on multi-valued group-by fields by treating them as if each value represented a separate event. However, eventcount only appears once in the data, and the "interesting fields" only displays its count and average of the entire number of resulting complete transactions. So in the first case, you have (probably) four transactions with two lines each (and two occurrences of user_id), and two transactions with eight lines each (and eight occurrences of the user_id). So, your average would be computed as (8x(8x2) + 2x(2x4))/(8x2 + 2x4) = 6. In the second case, you simply have 4 occurrences of 2, and 2 occurrences of 8, so the average is (2x4 + 8x2)/(4+2 = 4).

It is quite easy to see if you add count(eventcount) to your results. In that case, the stats command will return 24 items, while the "Interesting Fields" will show 6 transactions/events.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...