This has stumped me for too long so I'm opening it up to the experts.
I have some event data of format "timestamp, Category, SubCategory, X, Y". The data has already been processed by the source system so that there are, say, 5000 events per timestamp. I want to filter on the Category, leaving about 500 relevant events to process.
X is a data volume (e.g. through an interface). I can generate the total using sum(X).
Y gives me the number of users. There is a fixed value of Y per SubCategory, so that for a given Category values(Y) would contain 4 or 5 data points. This is not really a problem as I can return the correct value in a subsearch through "join Category,SubCategory [search | stats | mvexpand | stats ]"
I want to calculate the average per-user volume for X for a given category and also for each subCategory within the category.
Using stats gives me:
SubCategory UsersInSubCategory sum(X) sum(X/Y)
A 100 100MB 1MB
B 200 200MB 1MB
Totals 300 300MB 2MB
This is correct when breaking out by SubCategory, but for the whole Category I cannot use sum(X/Y) as what I want is sum(X)/sum(Y). Since the underlying events are in the thousands avg(X/Y) gives me averages per event, not per SubCategory.
I'm beginning to think two queries might be easier.
OK, so I played around an piping into eval started working. Go figure. Here's my current solution:
search and filter on Category
| stats first(_time) as timestr, values(Y) as y, sum(X) as x by date_year, date_month, date_mday, date_hour, SubCategory
| convert timeformat="%Y-%m-%d %H:00" ctime(timestr) as DateTime
| stats sum(y) as y, sum(x) as x by DateTime, SubCategory
| eval r=x/y
| chart sum(r) by DateTime, SubCategory
The raw data is grouped into 300s clumps comprising several thousand unique events. The first stats command creates the period over which the summaries are built and includes a timestamp which is used by the second stats to build the actual table. Then we eval the average and pipe into chart to produce something that can be put into a graph.
This allows me to keep or drop the SubCategory grouping in the second stats and the chart clauses and calculate the correct average over the Category.
By using stats I've lost the event timestamp so needed to manually re-insert it. I works, but not as nicely as timechart as the X-Axis strings are quite long and a bit ugly when plotting over a week (168 points per series). Changing it to summarise over different time periods would involve changing the 2nd and 3rd lines.
hi! have you tried using the eval command to calculate sum(X)/sum(Y)?
... | stats sum(X) AS x sum(Y) AS y | eval r=x/y
you can read more about eval in the search reference: http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Eval
Instead of piping from one stats command into a second stats command, why not try using eventstats to calculate the sums?
... | eventstats sum(X) as sumX, sum(Y) as sumY | eval r=sumX/sumY | stats values(r)
I did try that. I was expecting that the first stats would generate a table which can be piped onwards through more functions so I could generate averages over that summary table (4 or 5 rows) however I cant get the value of 'r' to appear anywhere. Piping the above into another stats results in no results.