Splunk Search

How do I create an multivalue field after stats that don't include it?

Explorer

Is there a way to aggregate data and then show additional fields as mv fields without running another search? I want to aggregate on only 2 fields but then show additional "thrown away" fields as multi value in the final results.

Sample data:

User    Machine Function
Adam    1           x
Adam    1           y
Adam    1           y
Betty   2           x

Working query:

stats count by User, Machine, Function

Results:

User    Machine     Function    Count
A       1           x           1
A       1           y           2
B       2           x           1

Tried query (doesn't work):

stats count by User, Machine | mvcombine Function | stats sum(Count), User, Machine, Function

Desired results:

User    Machine     Function    Count
A       1           x           3
                      y
B       2           x           1
0 Karma

I think this query will get you the table you want:

| stats values(Function) AS Function, count AS Count BY User, Machine

Explorer

A similar query over a week's worth of my real data shows 260 seconds for the eventstats version and 330 seconds for the values version. Yours is simpler to write but not as quick. Will confirm by running both again though.

0 Karma

Explorer

You beat me to my own answer but my approach was slightly different. Waiting for the moderator to approve it but basically I used eventstats which keeps the other values:
| eventstats count by User, Machine | mvcombine Function | eval Function=mvdedup(Function)

0 Karma

Explorer

Actually I think I just answered my own question, eventstats keeps the data in a form I can combine:

eventstats count by User, Machine | mvcombine Function

Results:

Function    Machine User    count
x           1       Adam    3
y
y
x           2       Betty   1
0 Karma

Explorer

So now my question, which approach is "better" as far as query costs would be concerned?

0 Karma

Explorer

Eventstats with many other fields has to be cleaned up, however by removing _* fields and specifying the exact fields to keep, so it becomes:
eventstats count by User, Machine | fields Count, User, Machine, Function | fields - _* | mvcombine Function | eval Function=mvdedup(Function) | sort -Count

0 Karma

You can certainly test both approaches over your data set to see which completes more efficiently, but I'm pretty sure my approach is more efficient. Stats is almost always more efficient than eventstats, I believe, specifically because it discards extra fields immediately.

Explorer

I would have thought the same, but wanted to know for sure.

So I've run each version of the query over and over again, both for short durations and for long. The variability of server responses while everyone else is running queries makes this hard to test. However, when running them simultaneously, they are mostly in a dead heat (~977 seconds over 30 days, ~10-15 seconds over 4 hours).

When running one of them over and over again, and then doing the same with the other, the average performance of the eventstats is slightly better. I think what might be going on is that while eventstats is using more memory during the query, this doesn't cost much in the way of performance. Stats values, on the other hand, might be wasting performance in upkeep while it throws things away.

Fair enough! Thanks for running the experiments and sharing your results. I've definitely found with Splunk that it's worth the time to run experiments like this up front to tune my approaches, because what works well for one data set may not be as efficient for another. So you're very smart to set up these A/B tests and choose your final approach accordingly.

0 Karma

Explorer

I pasted the wrong "tried" query but it doesn't really matter. I'm just trying to show that I can't use mvcombine once stats throws away the other fields. The alternative, to mvcombine at the end, can't combine if the Count column changes.

0 Karma