Splunk Search

weighted average and coalescing raw data



this is how my log file looks:

2012-06-20 10:50:00 label="aService",avg=340.44,min=290,max=537,stdDev=52.78,count=217

2012-06-20 10:55:00 label="aService",avg=484.11,min=333,max=1243,stdDev=222.97,count=15

as you can notice I already have some coalescence produced by the logging service, perf4j in this case. Now I would like to generate a chart representing the average execution time, over a variable period of time. In order to do so and apply a consistent scaling, I need to weight the averages of each log entry by using both the avg and count fields.

Do you know how this can be achieved in splunk search syntax?

Tags (3)


I suggest the eval command, along with a few other steps:

yoursearchhere |
eval sumX=avg*count |
eval countX=count |
bucket _time span=1h | 
stats sum(sumX) as hourlySum sum(countX) as hourlyCount by _time |
eval hourlyAvg = round(hourlySum / hourlyCount, 1) |
fields hourlyAvg _time |
chart max(hourlyAvg) by _time

I know this is a bit odd, but I think it will work. First, sumX tries to get back to the original value that was used to create the average (an average = a sum / a count).

Then the data is grouped into hourly time "buckets" and a new average is computed. You could change the time interval to whatever you like in the bucket command.

Finally, the new average is plotted against the time buckets. The max(hourlyAvg) is a bit of a kludge, but the chart command requires a function for the Y axis - since there is only one value per interval, the max function will return the single value.

For my example, I assumed that you would run this search over a time period like "Last 24 hours" and summarize the data hourly, but you can change the time selection and the bucket to whatever you need. Just be aware that the is a limit to how many time periods Splunk can display in chart format. If you have too many, try the table format instead.