Solved: Performant method for referring to original event ...

Graham_Hanningt · ‎09-16-2019

I am working with computer systems—for this question, the type of systems is not important—that forward events to Splunk (7.3).

These events contain periodic snapshots of system performance indicators. For example, each system can perform up to some maximum number of tasks. Each event contains that number in a field named maxtasks. maxtasks can be different for each system, and the maxtasks for a system can change between events. Each event also contains a field named peaktasks that represents the peak number of tasks in that system in the period of time covered by that event.

I want to visualize peaktasks as a percentage of maxtasks for each system, and then show the 10 systems with the highest percentage value; that is, the systems that reached, or came closest to reaching, maxtasks.

The search command is fairly straightforward:

| stats max(eval(peaktasks/maxtasks*100)) as "% of maxtasks" by system
| sort - "% of maxtasks" 
| head 10

The problem: I also want to include, in the same chart, the original peaktasks and maxtasks values that correspond to those max-aggregrated percentages.

Here's my current solution:

| eval "% of maxtasks"= peaktasks/maxtasks * 100
| dedup system sortby -"% of maxtask" 
| table system, peaktasks, maxtasks, "% of maxtasks" 
| sort - "% of maxtasks"
| head 10

I overlay % of maxtasks as a line over a column chart that shows maxtasks and peaktasks side-by-side for each system.

This works. At least, I think it does; the resulting data looks right, but this is my first use of the dedup command with sortby.

Here, I'm using eval in a separate step to insert a new percentage field into each event, then using dedup with sortby to keep only a single event—the one with the highest percentage—for each system.

I have a working solution. So, what's my question?

My main question (other than: is this doing what I think it's doing?): is there a more performant way to do this?

Depending on the time range involved, the base search might return many thousands of events. eval and dedup might not be the most performant choices.

Advice, recommendations welcome.

wmyersas · ‎09-17-2019

Try this:

| eval pct=peaktasks/maxtasks * 100
| stats count by system peaktasks maxtasks pct
| sort - pct
| rename pct as "% of maxtasks"
| head 10
| fields - count

View solution in original post

wmyersas · ‎09-17-2019

Try this:

| eval pct=peaktasks/maxtasks * 100
| stats count by system peaktasks maxtasks pct
| sort - pct
| rename pct as "% of maxtasks"
| head 10
| fields - count

Graham_Hanningt · ‎09-18-2019

Thanks! I had a nagging feeling that I was missing something that ought to have been obvious to me, and you've provided the answer: use the by clause of the stats command to include the original fields whose values I want to "preserve" after the aggregation.

Nit: as provided, your answer doesn't produce the results I need. It doesn't dedup by system. So, for my particular data, the results produced by your answer are all for the same system, because that system has many events with a higher percentage value than any other system.

Here's my modified version of your answer, replacing the sort with a dedup, producing the results I need:

| eval pct=peaktasks/maxtasks * 100
| stats count by system peaktasks maxtasks pct
| dedup system sortby -pct 
| rename pct as "% of maxtasks"
| head 10
| fields - count

Interestingly, the Splunk Job Inspector indicates that this modified version of your answer takes longer—is less performant—than my solution.

For my solution, the Inspector reports:

This search has completed and has returned 7 results by scanning 863 events in 0.288 seconds

whereas my modified version of your answer takes 0.396 seconds.

Yes, okay: right now, I'm working with a very small data set: only 863 events. I suspect (but I don't know) that your solution will be relatively more performant when there are more events. While I haven't spent time thinking about this too deeply, I think I prefer to dedup after the stats, as per your answer, rather than doing the dedup on the original set of events. Maybe I'll be proved wrong about that, but for now, and because you've taken the trouble to answer, and you provided an insight I was missing (adding fields to the stats by clause, I'm going to accept your answer. Thanks again!

wmyersas · ‎09-19-2019

Yours is slower because you're adding more sorting and deduping after the stats 🙂

You never want to dedup if you can stats - stats is much much faster

That said, sometimes you have no choice but to dedup .. so just do it as late in the search as possible 🙂

Graham_Hanningt · ‎09-19-2019

Thanks again for your help, much appreciated.

In this case, I can't see any alternative to a dedup with sortby. I take your point about doing it as late as possible.

The search I cite in my question produces the results I need without using stats. I'll call this "solution 1".

Your answer (a) uses stats but (b) doesn't produce the results I need. Not as-is. So, in that sense, it's not a solution.

Your answer uses stats purely (see the note below) for data reduction: it consolidates events with identical by field values into a single row. The reduction ratio depends on the event field values. For example, if each original event has a unique peaktasks value, there would be no data reduction.

Note: The count field produced by this stats has no intrinsic value in this context. We remove it.

For my current data, with 863 events (a relatively small set compared to the set that I anticipate working with soon), that stats command outputs 224 rows, which is a significant reduction.

That stats command does not remove the need for dedup: it just reduces the number of input rows to dedup. For performance reasons, that might, as you imply, be reason enough to do it.

I edited your answer to produce the results I need, replacing the sort with a dedup. I'll call this "solution 2".

For my current data, solution 1—the original solution cited in my question, which does not use stats—is faster than solution 2—your answer, using stats, adjusted to produce the results I need.

I suspect, but do not know, that solution 2 (based on your answer) might be faster with more events.

I look forward to doing further testing with a much larger set of events. For example, I'm curious to see—while acknowledging that this depends on the data values—if there's a performance crossover point between a dedup-only (solution 1) and a stats-followed-by- dedup (solution 2).

P.S. I'll likely apply round() to the eval for the percentage field.

Performant method for referring to original event field values after aggregation?

Enterprise Security Content Update (ESCU) | New Releases

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

Index This | What are the 12 Days of Splunk-mas?

Performant method for referring to original event field values *after* aggregation?

Enterprise Security Content Update (ESCU) | New Releases

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

Index This | What are the 12 Days of Splunk-mas?

Performant method for referring to original event field values after aggregation?