I have a web request log with searchtime extracted fields customer_id, object_id, response_time. I would like to craft a search that finds the top 50 most requested object_ids for a given customer_id and then find the avg of the response times for those 50 obeject_ids. so at the end i should have a table with column headings customer_id, object_id, request_count, average response time.
this search will return the top 50:
customer_id=X | stats values(customer_id) count as "Number of Requests" by object_id | sort -"Number of Requests" | head 50
but if i try to pipe the output to stats avg(response_time) I get an empty table "no results found"
I can get it to work by doing the stats avg(response_time) before the "head", but that seems grossly inefficient since the number of unique object_ids can be large. i'm sure i'm missing something simple!
The most straightforward way of grabbing the stats would be as you suggest at the end of your question, just calculate it all and then single out the top 50 object_id's along with their respective response time averages. A seemingly more efficient way would be to use a subsearch that gets the top 50 object_id's first of all, and then use the results from that search in another search that grabs the average.
* [search customer_id=X | top 50 object_id | fields object_id]
| stats avg(response_time) AS response_average by object_id
However I doubt you'll get any noticeable performance boost by doing this. Splunk will have to retrieve all events to get the object_id count anyway so it might as well calculate the response time average for each object_id while it's at it. The following search should yield the exact same results as the one above so just use the one that feels best and gives the best performance. I'd be curious to hear reports of any speed differences!
customer_id=X | stats count,avg(response_time) AS response_average by object_id
| sort -count | head 10 | fields - count | sort -response_average
The most straightforward way of grabbing the stats would be as you suggest at the end of your question, just calculate it all and then single out the top 50 object_id's along with their respective response time averages. A seemingly more efficient way would be to use a subsearch that gets the top 50 object_id's first of all, and then use the results from that search in another search that grabs the average.
* [search customer_id=X | top 50 object_id | fields object_id]
| stats avg(response_time) AS response_average by object_id
However I doubt you'll get any noticeable performance boost by doing this. Splunk will have to retrieve all events to get the object_id count anyway so it might as well calculate the response time average for each object_id while it's at it. The following search should yield the exact same results as the one above so just use the one that feels best and gives the best performance. I'd be curious to hear reports of any speed differences!
customer_id=X | stats count,avg(response_time) AS response_average by object_id
| sort -count | head 10 | fields - count | sort -response_average
i'll let you know if i see any speed differences