This has stumped me for too long so I'm opening it up to the experts.
I have some event data of format "timestamp, Category, SubCategory, X, Y". The data has already been processed by the source system so that there are, say, 5000 events per timestamp. I want to filter on the Category, leaving about 500 relevant events to process.
X is a data volume (e.g. through an interface). I can generate the total using sum(X).
Y gives me the number of users. There is a fixed value of Y per SubCategory, so that for a given Category values(Y) would contain 4 or 5 data points. This is not really a problem as I can return the correct value in a subsearch through "join Category,SubCategory [search | stats | mvexpand | stats ]"
I want to calculate the average per-user volume for X for a given category and also for each subCategory within the category.
Using stats gives me:
SubCategory UsersInSubCategory sum(X) sum(X/Y)
A 100 100MB 1MB
B 200 200MB 1MB
Totals 300 300MB 2MB
This is correct when breaking out by SubCategory, but for the whole Category I cannot use sum(X/Y) as what I want is sum(X)/sum(Y). Since the underlying events are in the thousands avg(X/Y) gives me averages per event, not per SubCategory.
I'm beginning to think two queries might be easier.
... View more