Splunk Search

Why does using eventstats result in seemingly lost data at a certain number of events?

New Member

I'm trying to calculate the percentage of a specific account's usage. To do this, I'm calculating the usage across all events, then adding the usage on a per account basis and dividing that by the total. A test search that I'm using to try and figure out where things are getting lost looks like this:

``````... | table account,usage
|eventstats sum(usage) as total
|eventstats sum(usage) as usageByAccount by account
|dedup account
|eventstats sum(usageByAccount) as checkedTotal
``````

At a 4hour time, total equals checkedTotal perfect.
As I expand it out closer to 24 hours (with the desire of going to 30 days) checkedTotal is significantly less than total.

In trying to nail down when it breaks, it's around 530k events, and the number of distinct accounts in that time period is about 180k.

Any ideas on what's going wrong, or hopefully easier - what I'm doing wrong?

Tags (5)
1 Solution
Legend

The documentation for the eventstats command says "In the limits.conf file, the max_mem_usage_mb parameter is used to limit how much memory the stats a..."

The search job inspector may be able to tell you if this is happening.

But you can compute this more simply by doing the following:

``````yoursearchhere
| stats sum(usage) as usageByAccount by account
| eventstats sum(usageByAccount) as total
| eval percentageByAccount = round(usageByAccount*100/total,1)  . "%"
``````

This will also be faster and take a lot less memory. My solution uses the stats command and produces a results table of usage by account. This automatically dedups account and creates a single line in the table for each account (aka a result). After that point, the eventstats command operates over results, not the individual events.

IMO, your use of eventstats will always be problematic, even if you raise your memory limits.

Legend

The documentation for the eventstats command says "In the limits.conf file, the max_mem_usage_mb parameter is used to limit how much memory the stats a..."

The search job inspector may be able to tell you if this is happening.

But you can compute this more simply by doing the following:

``````yoursearchhere
| stats sum(usage) as usageByAccount by account
| eventstats sum(usageByAccount) as total
| eval percentageByAccount = round(usageByAccount*100/total,1)  . "%"
``````

This will also be faster and take a lot less memory. My solution uses the stats command and produces a results table of usage by account. This automatically dedups account and creates a single line in the table for each account (aka a result). After that point, the eventstats command operates over results, not the individual events.

IMO, your use of eventstats will always be problematic, even if you raise your memory limits.

New Member

Thanks for the pointer. Yeah, doing a stats sum by account did the trick. Was able to verify that total matches the overall total getting me what I need.

I did try to up the max_mem_usage_mb and it didn't help, nor did a job inspection indicate that it was hitting a limit - but problem dealt with.

Thanks!

Get Updates on the Splunk Community!