Solved: How to generate a search to chart average response...

tonymakos · ‎01-19-2017

Hi again

This is following on from my question the other day - "How to generate a search to chart an average response time against a count of responses per minute?". I've realised that what i'm after is different from what i orginally requested - i'm looking at a throughput graph of some sort.

So, my previous "answer" to get the above was timechart span=1m count as requests_per_minute, avg(response_time) as avg_response_time but what this does is create a series of every minute in the data set against the average response time for events within that particular minute. What I actually need is an x-axis of potential values of responses per minute (say 1 to 1000) with a y-axis of average response times over the entire dataset for that value of rpm.

So essentially i need to average the averages (?) of response times for each value of x. This means that regardless of the size of dataset the axes will be the same size (within reason) but i'm stuck as to how to aggregate averages in splunk query language. Can anyone point me in the right direction? I appreciate this might be a long piece of code.... 🙂

DalJeanis · ‎01-19-2017

In splunk, as opposed to excel, it helps to look at the underlying data, and strip out everything you don't need. For each event, you just need _time and ResponseTime.

[your search] | eval Time=relative_time(_time,"@s") | fields Time, ResponseTime
| stats avg(ResponseTime) as AvgResponseTime, sum(ResponseTime) as TotalResponseTime, 
    count as NbrResponses by Time
| eventstats sum(TotalResponseTime) As SumTotalResponseTime, 
    sum(NbrResponses) as SumNbrResponses by NbrResponses
| eval AvgResponseTimeByNbrResponses = round(SumTotalResponseTime/SumNbrResponses,2)
| fields Time NbrResponses  AvgResponseTime  AvgResponseTimeByNbrResponses

That should give you a series of minute by minute, average response times for that minute, compared against the typical average response times during minutes when there are that many responses.

I've tested the code against my system and it gets reasonable results.

Translation of the code -

Take your _time and bin it to the second.

Calculate the stats for each second. NOTE - I'm including in those stats the weighted total response time, since I believe you don't want an average of averages, you want an average of events.

Calculate the weighted average of all similar time periods. NOTE - here, they have not been binned in terms of how many transactions per second. You will probably want to partition them in some fashion before the eventstats command instead of using the raw NbrResponses figures.

View solution in original post

DalJeanis · ‎01-19-2017

In splunk, as opposed to excel, it helps to look at the underlying data, and strip out everything you don't need. For each event, you just need _time and ResponseTime.

[your search] | eval Time=relative_time(_time,"@s") | fields Time, ResponseTime
| stats avg(ResponseTime) as AvgResponseTime, sum(ResponseTime) as TotalResponseTime, 
    count as NbrResponses by Time
| eventstats sum(TotalResponseTime) As SumTotalResponseTime, 
    sum(NbrResponses) as SumNbrResponses by NbrResponses
| eval AvgResponseTimeByNbrResponses = round(SumTotalResponseTime/SumNbrResponses,2)
| fields Time NbrResponses  AvgResponseTime  AvgResponseTimeByNbrResponses

That should give you a series of minute by minute, average response times for that minute, compared against the typical average response times during minutes when there are that many responses.

I've tested the code against my system and it gets reasonable results.

Translation of the code -

Take your _time and bin it to the second.

Calculate the stats for each second. NOTE - I'm including in those stats the weighted total response time, since I believe you don't want an average of averages, you want an average of events.

Calculate the weighted average of all similar time periods. NOTE - here, they have not been binned in terms of how many transactions per second. You will probably want to partition them in some fashion before the eventstats command instead of using the raw NbrResponses figures.

tonymakos · ‎01-20-2017

This is fantastic - thank you so much for helping with this.

DalJeanis · ‎01-20-2017

Quite welcome!

DalJeanis · ‎01-19-2017

You need three dimensions. It sounds like you're really looking for a heatmap.

For each time slice, you want the number of responses (either starting or concluding in that time slice) that took each length of time in your expected duration list. These vertical durations I'll call "cells" for lack of a better term.

So, you'd do a long-term query to determine your "average number of responses" or "average percentage of responses" in each cell. It's up to you how complicated you might want that. Do you account for time of day, day of week, or just total aggregate?

Then, you'd do the count, and/or the percentage calculation, for each cell for each time slice. I might calculate it both as cardinal number and percentage of the time slice, and display both in two stacked coordinated heatmaps. As well as that, the overall weighted average would also be calculated.

Generally, it would be presented as blue-cool to red-hot, although your mileage may vary there... or you could do sparklines for each vertical cell, etc...

Think about how you'd like it to look... then when you know how you'd like to present it, then we can help get the data into a format that will facilitate that.

tonymakos · ‎01-19-2017

Thanks for the feedback - really appreciate it. I'm currently working it out in Excel and this is the way i'm looking at it:

Currently i'm getting the data in three columns:
time (minute) > requests_per_minute > avg_response_time

I can then dismiss the time column, then order it by requests_per_minute. This will give multiple entries for 1rpm, 2rpm, etc. all the way up to around 2000. I'd then like to average these avg_response_time results for each rpm, and chart the result as a scatter graph with rpm on the x-axis and the averaged avg_response_time on the y-axis.

Does this make sense? It's a few steps in Excel but i'm sure i can break it down into splunk language...

How to generate a search to chart average response time for each value of responses per minute?

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

Are you a member of the Splunk Community?

How to generate a search to chart average response time for each value of responses per minute?

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...