I feel like there should be an easy answer for this, but that my brain isn't finding it, so hopefully someone can reprieve me.
Suppose I have a log with the processing time for a number of URLs, across a number of servers. I want to toss into a summary index the top 10 longest running URLs per server, so I can later use it in a subsearch for host=foo.
In essence, this could work if top supported it:
MySearch earliest=-1d@d latest=@d | bucket _time span=1d | stats sum(ProcessTime) as ProcessTime by URL, host | top limit=10 labelField=URL ProcessTime by host | stats values(URL) by host
This also feels like something that could work if stats supported it:
MySearch earliest=-1d@d latest=@d | bucket _time span=1d | stats limit=10 sum(ProcessTime) as ProcessTime by URL, host | stats values(URL) by host
How can I do what I'm trying to do?
Well, first of all, I will note that if you're using a summary, you should be aware that your daily summary won't aggregate, i.e., having the top 10 for each day in your summary does not let you get the top 10 for, say, a whole week in general.
If you just want what you're asking for, though, a quick way to get this is:
MySearch earliest=-1d@d latest=@d
| bucket _time span=1d
| stats sum(ProcessTime) as ProcessTime by URL,host
| sort host,-ProcessTime
| streamstats global=f current=f window=0
count by host
| where count < 10
| fields - count
Well, first of all, I will note that if you're using a summary, you should be aware that your daily summary won't aggregate, i.e., having the top 10 for each day in your summary does not let you get the top 10 for, say, a whole week in general.
If you just want what you're asking for, though, a quick way to get this is:
MySearch earliest=-1d@d latest=@d
| bucket _time span=1d
| stats sum(ProcessTime) as ProcessTime by URL,host
| sort host,-ProcessTime
| streamstats global=f current=f window=0
count by host
| where count < 10
| fields - count
I've turned this comment into a question of its own: http://splunk-base.splunk.com/answers/30247/top-values-by-multiple-fields-with-large-datasets
I'm working on a different scenario for the same issue now, with much greater field variability. What is the upper limit of how many values I can toss at sort | streamstats | where | fields
before I start getting failures?
I'm splitting by three fields -- FieldA has 30 options, FieldB has up to 2000 and FieldC has up to 10,000. In the raw data, right now I have about 500,000 different possibilities going into the sort, with the expectation of exceeding 1,000,000 during the lifetime of the app.