Is there a way to aggregate data and then show additional fields as mv fields without running another search? I want to aggregate on only 2 fields but then show additional "thrown away" fields as multi value in the final results.
User Machine Function Adam 1 x Adam 1 y Adam 1 y Betty 2 x
stats count by User, Machine, Function
User Machine Function Count A 1 x 1 A 1 y 2 B 2 x 1
Tried query (doesn't work):
stats count by User, Machine | mvcombine Function | stats sum(Count), User, Machine, Function
User Machine Function Count A 1 x 3 y B 2 x 1
A similar query over a week's worth of my real data shows 260 seconds for the eventstats version and 330 seconds for the values version. Yours is simpler to write but not as quick. Will confirm by running both again though.
You beat me to my own answer but my approach was slightly different. Waiting for the moderator to approve it but basically I used eventstats which keeps the other values:
| eventstats count by User, Machine | mvcombine Function | eval Function=mvdedup(Function)
Actually I think I just answered my own question, eventstats keeps the data in a form I can combine:
eventstats count by User, Machine | mvcombine Function
Function Machine User count x 1 Adam 3 y y x 2 Betty 1
Eventstats with many other fields has to be cleaned up, however by removing _* fields and specifying the exact fields to keep, so it becomes:
eventstats count by User, Machine | fields Count, User, Machine, Function | fields - _* | mvcombine Function | eval Function=mvdedup(Function) | sort -Count
You can certainly test both approaches over your data set to see which completes more efficiently, but I'm pretty sure my approach is more efficient. Stats is almost always more efficient than eventstats, I believe, specifically because it discards extra fields immediately.
I would have thought the same, but wanted to know for sure.
So I've run each version of the query over and over again, both for short durations and for long. The variability of server responses while everyone else is running queries makes this hard to test. However, when running them simultaneously, they are mostly in a dead heat (~977 seconds over 30 days, ~10-15 seconds over 4 hours).
When running one of them over and over again, and then doing the same with the other, the average performance of the eventstats is slightly better. I think what might be going on is that while eventstats is using more memory during the query, this doesn't cost much in the way of performance. Stats values, on the other hand, might be wasting performance in upkeep while it throws things away.
Fair enough! Thanks for running the experiments and sharing your results. I've definitely found with Splunk that it's worth the time to run experiments like this up front to tune my approaches, because what works well for one data set may not be as efficient for another. So you're very smart to set up these A/B tests and choose your final approach accordingly.
I pasted the wrong "tried" query but it doesn't really matter. I'm just trying to show that I can't use mvcombine once stats throws away the other fields. The alternative, to mvcombine at the end, can't combine if the Count column changes.