Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Splunk Search

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Community
- :
- Splunk Answers
- :
- Using Splunk
- :
- Splunk Search
- :
- sum(x/y) and sum(x)/sum(y)

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

sum(x/y) and sum(x)/sum(y)

inglisn

Path Finder

09-14-2011
05:06 AM

This has stumped me for too long so I'm opening it up to the experts.

I have some event data of format "timestamp, Category, SubCategory, X, Y". The data has already been processed by the source system so that there are, say, 5000 events per timestamp. I want to filter on the Category, leaving about 500 relevant events to process.

X is a data volume (e.g. through an interface). I can generate the total using sum(X).

Y gives me the number of users. There is a fixed value of Y per SubCategory, so that for a given Category values(Y) would contain 4 or 5 data points. This is not really a problem as I can return the correct value in a subsearch through "join Category,SubCategory [search | stats | mvexpand | stats ]"

I want to calculate the average per-user volume for X for a given category and also for each subCategory within the category.

Using stats gives me:

SubCategory UsersInSubCategory sum(X) sum(X/Y)

A 100 100MB 1MB

B 200 200MB 1MB

Totals 300 300MB 2MB

This is correct when breaking out by SubCategory, but for the whole Category I cannot use sum(X/Y) as what I want is sum(X)/sum(Y). Since the underlying events are in the thousands avg(X/Y) gives me averages per event, not per SubCategory.

I'm beginning to think two queries might be easier.

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: sum(x/y) and sum(x)/sum(y)

sophy

Splunk Employee

09-14-2011
10:20 AM

hi! have you tried using the eval command to calculate sum(X)/sum(Y)?

```
... | stats sum(X) AS x sum(Y) AS y | eval r=x/y
```

you can read more about eval in the search reference: http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Eval

Highlighted
##

I did try that. I was expecting that the first stats would generate a table which can be piped onwards through more functions so I could generate averages over that summary table (4 or 5 rows) however I cant get the value of 'r' to appear anywhere. Piping the above into another stats results in no results.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: sum(x/y) and sum(x)/sum(y)

inglisn

Path Finder

09-15-2011
02:35 AM

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: sum(x/y) and sum(x)/sum(y)

acdevlin

Communicator

09-15-2011
09:23 AM

Instead of piping from one stats command into a second stats command, why not try using eventstats to calculate the sums?

```
... | eventstats sum(X) as sumX, sum(Y) as sumY | eval r=sumX/sumY | stats values(r)
```

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: sum(x/y) and sum(x)/sum(y)

inglisn

Path Finder

09-16-2011
02:31 AM

OK, so I played around an piping into eval started working. Go figure. Here's my current solution:

search and filter on Category

| stats first(*time) as timestr, values(Y) as y, sum(X) as x by date*year, date*month, date*mday, date_hour, SubCategory

| convert timeformat="%Y-%m-%d %H:00" ctime(timestr) as DateTime

| stats sum(y) as y, sum(x) as x by DateTime, SubCategory

| eval r=x/y

| chart sum(r) by DateTime, SubCategory

The raw data is grouped into 300s clumps comprising several thousand unique events. The first stats command creates the period over which the summaries are built and includes a timestamp which is used by the second stats to build the actual table. Then we eval the average and pipe into chart to produce something that can be put into a graph.

This allows me to keep or drop the SubCategory grouping in the second stats and the chart clauses and calculate the correct average over the Category.

By using stats I've lost the event timestamp so needed to manually re-insert it. I works, but not as nicely as timechart as the X-Axis strings are quite long and a bit ugly when plotting over a week (168 points per series). Changing it to summarise over different time periods would involve changing the 2nd and 3rd lines.