Deployment Architecture

How to measure web traffic variance between servers with ITSI metric?

Path Finder

This sounds more complicated than it really is. At the root, I want to know if web traffic is balanced across my web farm.

This query accomplishes that by finding out the average traffic per server in the farm (my sub search) and then calculate how much each server varies from that average. It works pretty well.

host=myservers index=iis  | stats count AS server_value BY host | join [search host=myservers index=iis | stats count as LatestValue by host | stats mean(LatestValue) as client_mean] | eval percent_variance=round(((abs(server_value-client_mean)/client_mean)*100), 0)

My challenge is that I want to use this metric in Splunk ITSI. In ITSI your metrics have to have a _time associated with your events. This is easy enough to do by creating bins, doing stats by time, and joining on the _time.

host=myservers index=iis  | eval host=upper(host) | bin span=5m _time | stats count AS server_value BY _time, host | join _time [search host=myservers index=iis | eval host=upper(host) | bin span=5m  _time | stats count as LatestValue by _time, host | stats mean(LatestValue) as client_mean by _time] | eval percent_variance=round(((abs(server_value-client_mean)/client_mean)*100), 0)

This technically works but I'm running up against a classic "bin" issue. Bin always spans the bin to the whole time segments. ITSI runs metrics at 5-minute intervals by default and that makes sense. If I don't put a span, it will default to 5-second spans. That is too small of a sample time and the variances wildly differ. The 5-minute span above works well if you have a perfect 5-minute interval but you typically get some partial 5-minute block. Whatever time span I use I still typically have a partial bin that messes up the accuracy of my metric.

Any ideas for solutions or alternative approaches?

0 Karma
1 Solution

Path Finder

I found a fairly straight forward solution. Since I want time buckets that don't snap to fixed time frames, I implemented by own binning logic. It's pretty compact and you can just use it to replace a bin command.

The new logic makes bins of whatever size you want but works backwards from the moment your upper search time frame. If you run it with 10 minute spans at 5:25:35.112 then your last bin will be all the events from 5:15:35.112 to 5:25:35.112. With Splunk's default bin logic, your last bin would be from 5:20 to 5:25:35.112, which isn't the 10 minute span you asked for. You just have to make sure the time range you selected is evenly divisible by your time span. I haven't tested the effect of having a 30 min span on data that only has a 45 minute time range.

Solution:

Replace your bin command that might look like this:

bin span=5m _time

With this:

addinfo | eval _time=info_max_time-(ceil((info_max_time-_time)/300))*300

Replace the 300s with your desired span in seconds. In this case, 300 seconds is the same as the original 5 min span.

View solution in original post

0 Karma

SplunkTrust
SplunkTrust

I think makecontineous would be a good solution here.

You would need to specify a timespan and it will add empty buckets where there is no data to fill the span

... | makecontinuous _time span=10m

http://docs.splunk.com/Documentation/Splunk/7.0.2/SearchReference/Makecontinuous

0 Karma

Path Finder

I'm not having an issue with data gaps that makecontinuous can address but rather an issue that bin is snapping bins to even 5 minute increments. If I run with a 5 minute span against the last 5 minutes, I want 1 bucket with all of my data in it. Instead, if I ran it at 5:07:30, I'll get a 5:00-5:05 bin with half my data and a 5:05-5:10 bin with the other half of my data. The same thing happens if you have 5 minute spans across 60 minutes of data. The first and last bins will have incomplete sample size.

Separately, I did try makecontinuous based on your suggestion but I couldn't make it work. My understanding is that it is a replacement for bin. However, when I tried creating 5 minute spans, it created 1 second spans. I tried running against 5 and 60 minutes but it created 1 second spans both times.

Am I doing something wrong here?

host=myWebServers* index=iis | makecontinuous  _time span=5m | stats count BY _time, host
0 Karma

Path Finder

I found a fairly straight forward solution. Since I want time buckets that don't snap to fixed time frames, I implemented by own binning logic. It's pretty compact and you can just use it to replace a bin command.

The new logic makes bins of whatever size you want but works backwards from the moment your upper search time frame. If you run it with 10 minute spans at 5:25:35.112 then your last bin will be all the events from 5:15:35.112 to 5:25:35.112. With Splunk's default bin logic, your last bin would be from 5:20 to 5:25:35.112, which isn't the 10 minute span you asked for. You just have to make sure the time range you selected is evenly divisible by your time span. I haven't tested the effect of having a 30 min span on data that only has a 45 minute time range.

Solution:

Replace your bin command that might look like this:

bin span=5m _time

With this:

addinfo | eval _time=info_max_time-(ceil((info_max_time-_time)/300))*300

Replace the 300s with your desired span in seconds. In this case, 300 seconds is the same as the original 5 min span.

View solution in original post

0 Karma