I often run into a case where I find I need to take the same dataset and compute aggregate statistics on different group-by sets, for instance if you want the output of this: index=example | stats avg(field1) by x,y,z
| append [ index=example | stats perc95(field2) by a,b,c ] I am using the case n=2 groupbys for convenience. In the general case there are N groupbys, and arbitrary stats functions... what is the best way to optimize this kind of query, without using append (which runs into subsearch limits)? Some of the patterns I can think of are below. One way is to use appendpipe. index=example
| appendpipe [ | stats avg(field1) by x,y,z ]
| appendpipe [ | stats perc95(field2) by a,b,c ] Unfortunately this seems kind of slow, especially once you start having to add more subsearches and preserving and passing a large number of non-transformed events throughout the search. Another way is to use eventstats to preserve the events data, finishing it off with a final stats. index=example
| eventstats avg(field1) as avg_field1 by x,y,z
| stats first(avg_field1) as avg_field1, perc95(field2) by a,b,c Unfortunately this is not much faster. I think there is another way using streamstats in place of eventstats, but I still haven't figured out how to retrieve the last event without just invoking eventstats last() or relying on an expensive sort. Another way I've tried is intentionally duplicating your data using mvexpand which has the best performance by far. index=example
```Duplicate all the data```
| eval key="1,2"
| makemv delim="," key
| mvexpand key
```Set groupby = concatenation of groupby field values```
| eval groupby=case(key=1,x.",".y.",".z, key2=a.",".b.",".c, true(), null())
| stats avg(field1), perc95(field2) by groupby Are there any other patterns that are easier/faster? I'm curious as to how Splunk processes things under the hood, I know something called "map-reduce" is part of it but would be curious to know if anyone knows how to optimize this computation and why it's optimal in a theoretical sense.
... View more