I am working with computer systems—for this question, the type of systems is not important—that forward events to Splunk (7.3).
These events contain periodic snapshots of system performance indicators. For example, each system can perform up to some maximum number of tasks. Each event contains that number in a field named maxtasks
. maxtasks
can be different for each system, and the maxtasks
for a system can change between events. Each event also contains a field named peaktasks
that represents the peak number of tasks in that system in the period of time covered by that event.
I want to visualize peaktasks
as a percentage of maxtasks
for each system, and then show the 10 systems with the highest percentage value; that is, the systems that reached, or came closest to reaching, maxtasks
.
The search command is fairly straightforward:
| stats max(eval(peaktasks/maxtasks*100)) as "% of maxtasks" by system
| sort - "% of maxtasks"
| head 10
The problem: I also want to include, in the same chart, the original peaktasks
and maxtasks
values that correspond to those max
-aggregrated percentages.
Here's my current solution:
| eval "% of maxtasks"= peaktasks/maxtasks * 100
| dedup system sortby -"% of maxtask"
| table system, peaktasks, maxtasks, "% of maxtasks"
| sort - "% of maxtasks"
| head 10
I overlay % of maxtasks
as a line over a column chart that shows maxtasks
and peaktasks
side-by-side for each system.
This works. At least, I think it does; the resulting data looks right, but this is my first use of the dedup
command with sortby
.
Here, I'm using eval
in a separate step to insert a new percentage field into each event, then using dedup
with sortby
to keep only a single event—the one with the highest percentage—for each system.
I have a working solution. So, what's my question?
My main question (other than: is this doing what I think it's doing?): is there a more performant way to do this?
Depending on the time range involved, the base search might return many thousands of events. eval
and dedup
might not be the most performant choices.
Advice, recommendations welcome.
Try this:
| eval pct=peaktasks/maxtasks * 100
| stats count by system peaktasks maxtasks pct
| sort - pct
| rename pct as "% of maxtasks"
| head 10
| fields - count
Try this:
| eval pct=peaktasks/maxtasks * 100
| stats count by system peaktasks maxtasks pct
| sort - pct
| rename pct as "% of maxtasks"
| head 10
| fields - count
Thanks! I had a nagging feeling that I was missing something that ought to have been obvious to me, and you've provided the answer: use the by
clause of the stats
command to include the original fields whose values I want to "preserve" after the aggregation.
Nit: as provided, your answer doesn't produce the results I need. It doesn't dedup
by system. So, for my particular data, the results produced by your answer are all for the same system, because that system has many events with a higher percentage value than any other system.
Here's my modified version of your answer, replacing the sort
with a dedup
, producing the results I need:
| eval pct=peaktasks/maxtasks * 100
| stats count by system peaktasks maxtasks pct
| dedup system sortby -pct
| rename pct as "% of maxtasks"
| head 10
| fields - count
Interestingly, the Splunk Job Inspector indicates that this modified version of your answer takes longer—is less performant—than my solution.
For my solution, the Inspector reports:
This search has completed and has returned 7 results by scanning 863 events in 0.288 seconds
whereas my modified version of your answer takes 0.396 seconds.
Yes, okay: right now, I'm working with a very small data set: only 863 events. I suspect (but I don't know) that your solution will be relatively more performant when there are more events. While I haven't spent time thinking about this too deeply, I think I prefer to dedup
after the stats
, as per your answer, rather than doing the dedup
on the original set of events. Maybe I'll be proved wrong about that, but for now, and because you've taken the trouble to answer, and you provided an insight I was missing (adding fields to the stats
by
clause, I'm going to accept your answer. Thanks again!
Yours is slower because you're adding more sorting and dedup
ing after the stats
🙂
You never want to dedup
if you can stats
- stats
is much much faster
That said, sometimes you have no choice but to dedup
.. so just do it as late in the search as possible 🙂
Thanks again for your help, much appreciated.
In this case, I can't see any alternative to a dedup
with sortby
. I take your point about doing it as late as possible.
The search I cite in my question produces the results I need without using stats
. I'll call this "solution 1".
Your answer (a) uses stats
but (b) doesn't produce the results I need. Not as-is. So, in that sense, it's not a solution.
Your answer uses stats
purely (see the note below) for data reduction: it consolidates events with identical by
field values into a single row. The reduction ratio depends on the event field values. For example, if each original event has a unique peaktasks
value, there would be no data reduction.
Note: The count
field produced by this stats
has no intrinsic value in this context. We remove it.
For my current data, with 863 events (a relatively small set compared to the set that I anticipate working with soon), that stats
command outputs 224 rows, which is a significant reduction.
That stats
command does not remove the need for dedup
: it just reduces the number of input rows to dedup
. For performance reasons, that might, as you imply, be reason enough to do it.
I edited your answer to produce the results I need, replacing the sort
with a dedup
. I'll call this "solution 2".
For my current data, solution 1—the original solution cited in my question, which does not use stats
—is faster than solution 2—your answer, using stats
, adjusted to produce the results I need.
I suspect, but do not know, that solution 2 (based on your answer) might be faster with more events.
I look forward to doing further testing with a much larger set of events. For example, I'm curious to see—while acknowledging that this depends on the data values—if there's a performance crossover point between a dedup
-only (solution 1) and a stats
-followed-by- dedup
(solution 2).
P.S. I'll likely apply round()
to the eval
for the percentage field.