This sounds more complicated than it really is. At the root, I want to know if web traffic is balanced across my web farm.
This query accomplishes that by finding out the average traffic per server in the farm (my sub search) and then calculate how much each server varies from that average. It works pretty well.
host=myservers index=iis | stats count AS server_value BY host | join [search host=myservers index=iis | stats count as LatestValue by host | stats mean(LatestValue) as client_mean] | eval percent_variance=round(((abs(server_value-client_mean)/client_mean)*100), 0)
My challenge is that I want to use this metric in Splunk ITSI. In ITSI your metrics have to have a _time associated with your events. This is easy enough to do by creating bins, doing stats by time, and joining on the _time.
host=myservers index=iis | eval host=upper(host) | bin span=5m _time | stats count AS server_value BY _time, host | join _time [search host=myservers index=iis | eval host=upper(host) | bin span=5m _time | stats count as LatestValue by _time, host | stats mean(LatestValue) as client_mean by _time] | eval percent_variance=round(((abs(server_value-client_mean)/client_mean)*100), 0)
This technically works but I'm running up against a classic "bin" issue. Bin always spans the bin to the whole time segments. ITSI runs metrics at 5-minute intervals by default and that makes sense. If I don't put a span, it will default to 5-second spans. That is too small of a sample time and the variances wildly differ. The 5-minute span above works well if you have a perfect 5-minute interval but you typically get some partial 5-minute block. Whatever time span I use I still typically have a partial bin that messes up the accuracy of my metric.
Any ideas for solutions or alternative approaches?
... View more