Splunk Search

what are the calculated differences between stats and eventstats?

kingsizebk
Path Finder

The result from this search: index=_internal | eval something=case(kb!="0", "1") | stats sum(something) as sumST | stats avg(sumST)

Is different than the result from the this search: index=_internal | eval something=case(kb!="0", "1") | eventstats sum(something) as sumST | stats avg(sumST)

Even though I expect them to be exactly the same... Can anyone help me understand why?

Tags (3)
0 Karma
1 Solution

somesoni2
Revered Legend

You would see this difference in result (mostly on Avg only) as the dataset on which the Avg was created was different. Consider following example

Query: sourcetype=access status=200 action=purchase | eval sale=case(action="purchase", "1") | stats sum(sale) AS Sales count by date_mday date_hour

Result: (example) June 6 through June 8, 2014

date_mday    date_hour   Sales    count     ** this is no of rows at this hour
6               1          200      2        
6               10         500      2
6               15         700      2
7               2          100      1
7               7          400      2
7               12         900      1
7               22         100      1

Though there are total 11 entries, the avg will be created based on no of result from stats (7)

avg(Sales)=2900/7 = 414.2857142857143

Query: sourcetype=access status=200 action=purchase | eval sale=case(action="purchase", "1") | eventstats sum(sale) AS Sales by date_mday date_hour

Result: (example) June 6 through June 8, 2014

(one row for each events as eventstats just append columns with summary, doesn't summarize the data)
date_mday    date_hour   Sales           
6               1          200
6               1          200
6               10         500
6               10         500
6               15         700
6               15         700
7               2          100
7               7          400
7               7          400
7               12         900
7               22         100

Here the avg will be created based on all entries.

avg(Sales)=4700/11 = 427.2727272727273

View solution in original post

somesoni2
Revered Legend

You would see this difference in result (mostly on Avg only) as the dataset on which the Avg was created was different. Consider following example

Query: sourcetype=access status=200 action=purchase | eval sale=case(action="purchase", "1") | stats sum(sale) AS Sales count by date_mday date_hour

Result: (example) June 6 through June 8, 2014

date_mday    date_hour   Sales    count     ** this is no of rows at this hour
6               1          200      2        
6               10         500      2
6               15         700      2
7               2          100      1
7               7          400      2
7               12         900      1
7               22         100      1

Though there are total 11 entries, the avg will be created based on no of result from stats (7)

avg(Sales)=2900/7 = 414.2857142857143

Query: sourcetype=access status=200 action=purchase | eval sale=case(action="purchase", "1") | eventstats sum(sale) AS Sales by date_mday date_hour

Result: (example) June 6 through June 8, 2014

(one row for each events as eventstats just append columns with summary, doesn't summarize the data)
date_mday    date_hour   Sales           
6               1          200
6               1          200
6               10         500
6               10         500
6               15         700
6               15         700
7               2          100
7               7          400
7               7          400
7               12         900
7               22         100

Here the avg will be created based on all entries.

avg(Sales)=4700/11 = 427.2727272727273

View solution in original post

kingsizebk
Path Finder

a correction to something i said earlier... i am only seeing this with the tutorial data... the "internal" search above was different due to the above mentioned latest == now. using the tutorial data and these two searches: "sourcetype=access* status=200 action=purchase | eval sale=case(action="purchase", "1") | stats sum(sale) AS Sales by date_mday date_hour | stats avg(Sales)" and "sourcetype=access_* status=200 action=purchase | eval sale=case(action="purchase", "1") | eventstats sum(sale) AS Sales by date_mday date_hour | stats avg(Sales)"

0 Karma

kingsizebk
Path Finder

when i set earliest=1394596800 and latest=1394683200, still see the difference

0 Karma

somesoni2
Revered Legend

The value of latest is '06/10/2014 00:00:00'. Are you using 'now' as the latest? This can cause some difference as the value of latest will get changed whenever you run the query and the result will be based on different dataset. Can you try both the query with a timerange in the past (fixed)? e.g. earliest=1402200000 and latest=1402286400

0 Karma

kingsizebk
Path Finder

earliest=1402200000
latest=1402372800

CentOS release 6.5 (Final)

Linux wa1vmls3.ajcbjk.com 2.6.32-431.17.1.el6.x86_64 #1 SMP Wed May 7 23:32:49 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Splunk 6.0.1 (build 189883)

is there any other info that i can provide?

i just ran the job inspector (for both the "stats" and "eventstats" searches) and printed the outputs to PDF. if there is anyway to post the PDF file, please let me know.

0 Karma

somesoni2
Revered Legend

I tried with exact same query and both are returning same data. Tried with various time frames as well. Could you provide more information?

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.