Splunk Search
Highlighted

Select the Top 1 from a set of TopN

Motivator

Using the Splunk query language how would be a splunk query that returns the Top 1 from a set of Top N?

Data set sample:

time               Term         count
2014-03-28 10:00   hello        10
2014-03-28 10:00   ciao          9 
2014-03-28 10:00   nice          7
2014-03-28 11:00   nice         11
2014-03-28 11:00   great         8 
2014-03-28 11:00   precise       6
2014-03-28 12:00   yougotit      6
2014-03-28 12:00   ok            4 
2014-03-28 12:00   thanks        3

The splunk query should return the top 1 of each Top N set. Example:

time               Term         count
2014-03-28 10:00   hello        10
2014-03-28 11:00   nice         11
2014-03-28 12:00   yougotit      6

Thanks,
Lp

My solution:

After reading the suggestions provided in below answers,I took the following approach:

1) Create a summary index.

2) Create an hourly schedule search to get the Top N and store the results in the summary index. Splunk query:

index="my_raw_index" |eval time=strftime(_time, "%m/%d/%Y:%H:%M") |
top limit=0 term by time|streamstats count as rank|table time term count

Result set:

time              rank  Term         count
2014-03-28 10:00   1    hello        10
2014-03-28 10:00   2    nice         11
2014-03-28 10:00   3    yougotit      6

3) Then, by using the rank field, it is quite simple to get the Top 1 from the set of Top N result set from the summary index. Query example:

index=my_summary_index rank=1|table time Term count.

I think this approach would scale quite well.

Thanks,
Lp

Tags (1)
0 Karma
Highlighted

Re: Select the Top 1 from a set of TopN

Champion

your search|sort - time,count|dedup time,count

Highlighted

Re: Select the Top 1 from a set of TopN

Motivator

maybe this?

... your search ... | stats max(count) as count by Term
Highlighted

Re: Select the Top 1 from a set of TopN

SplunkTrust
SplunkTrust

or you take both examples and combine them like this run everywhere example:

index=_internal | bucket _time span=1h | eval myTime=_time | stats max(kbps) as max by series, myTime | sort - myTime, max | dedup myTime, max | eval myTime=strftime(myTime, "%F %T")

this will give you the highest thruput per hour per series. You have to adapt it to match your needs.

cheers, MuS

0 Karma
Highlighted

Re: Select the Top 1 from a set of TopN

SplunkTrust
SplunkTrust

Try this:

Your base search | sort _time, -count | streamstats count as sno by _time | where sno <2

this gives top 1 from each hour.

Highlighted

Re: Select the Top 1 from a set of TopN

Motivator

I use this one for hourly ranking, just to share:
I am assuming you have multiple same terms in an hour so I have "stats max(count) by ..." , but in other case, please change it to fit your need...

index="mine" filter_event
| bucket _time span=1h
| stats max(count) as count by term _time
| sort - count
| eval rank=1
| streamstats sum(rank) as rank by _time
| where rank<4
| xyseries _time rank term

This gives you top 3 for each hour.
Change where rank<4 to rank=1 or so to fit your need... and see how it goes...

0 Karma
Highlighted

Re: Select the Top 1 from a set of TopN

Motivator

What about

... | top 1 count by Term
0 Karma