Solved: Re: How to subset top N records from the number ge...

zztc2004 · ‎10-11-2017

Hi Splunk friends,

I am new to Splunk community and currently facing a question.
I have below table which was generated by some raw log-line data .
stats2 is actually the aggregated sum of stats1 group by ID.

ID  stat1   stats2(eventstats sum of stat1 by ID)
1   1          6
1   2          6
1   3          6
2   4          9
2   5          9
3   6          21
3   7          21
3   8          21
4   9          10
4   0          10
4   1          10

What I am looking for is, returning the subset of below table and only pick top N =2 in terms of stats2, for example:

ID  stat1   stats2(eventstats sum of stat1 by ID)
3      6       21
3      7       21
3.     8       21
4      9       10
4      0       10
4      1       10

I tried several methods that all failed, and I do not want to leverage join statement, which is not efficient in Splunk.

Thanks so much for the help.
Jay

DalJeanis · ‎10-11-2017

Here's one way

  your query
 | appendpipe [| dedup stats2 | sort 2 - stats2 | table stats2 | eval keepme="Y"]
 | eventstats values(keepme) as keepme by stats2 
 | where isnotnull(keepme) and isnotnull(stats1)

Of course, it would be more efficient to, instead of | eventstats to calculate stats2, to do the appendpipe strategy and summarize by ID directly, keeping the top 2.

View solution in original post

zztc2004 · ‎10-12-2017

I figured out another way to realize my need.
Basically, by leveraging the "sub-search" to extract and return the orgID with the top N value of stat2, and use the result of sub-search as the filter criteria to the primary search query.
For sub-search, pls refer here: http://docs.splunk.com/Documentation/SplunkCloud/6.6.3/Search/Aboutsubsearches

kyaparla · ‎10-11-2017

if I understand it correctly, you want rows with top 2 stats2 values right?

some thing like this should work. replace regexes or eval x= expression based on your data.

DalJeanis · ‎10-11-2017

@kayaparla - nice. That would work.

DalJeanis · ‎10-11-2017

Here's one way

  your query
 | appendpipe [| dedup stats2 | sort 2 - stats2 | table stats2 | eval keepme="Y"]
 | eventstats values(keepme) as keepme by stats2 
 | where isnotnull(keepme) and isnotnull(stats1)

Of course, it would be more efficient to, instead of | eventstats to calculate stats2, to do the appendpipe strategy and summarize by ID directly, keeping the top 2.

zztc2004 · ‎10-12-2017

Hi @DalJeanis,

Any idea, why I got error when I try to use sort within appendpipe[]?
So confused....

Jay

zztc2004 · ‎10-11-2017

Thanks, this appendpipe is very helpful for my case!

Jay

How to subset top N records from the number generated from eventstats

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes