Getting Data In

List hosts with highest value

Builder

Hi,

We're debugging an issue where disk latency shoots up at a specific time. I would like to create a search which shows the host with the highest latency at any specific minute.

So the base search is:

index=os sourcetype=iostat | multikv fields avgWaitMillis

...but then I'm not sure how to continue... I would like to find every host where avgWaitMillis is the highest for every minute.

Tags (1)
0 Karma

Influencer

I think you may want to pipe to the timechart command, which will allow you gain stats over time. You may be able to do something like:

..| timechart span=1m max(avgWaitMillis) as maxWait

I haven't used a split-by cause (don't think you'll need one), but if you need one, just add something like, "by someField" (where someField is a unique split-by-cause you have).

Please see documentation:

http://docs.splunk.com/Documentation/Splunk/4.3.4/SearchReference/Timechart
http://docs.splunk.com/Documentation/Splunk/4.3.4/SearchReference/CommonStatsFunctions

0 Karma

Builder

To elaborate a bit on that sample table:

At time n, the avgWaitMillis of host001 equals max(avgWaitMillis) of all hosts (at that time).

Likewise, at time l, the avgWaitMillis of host219 == max(avgWaitMillis) of all hosts at that time.

0 Karma

Builder

Thanks, but that is still not what I'm after. useother only affects the grouping of the hosts in the chart.

timechart is really not the answer here, since I'm not concerned about the values themselves, but which hosts had the max value at a particular time.

Since I'm primarily interested in the hostnames, a chart is probably not the best visualization, but rather a table, with values about like this:

time  , host_with_highest_latency
time n, host001.domain.com
time m, host321.domain.com
time l, host219.domain.com
0 Karma

Influencer

Have you tried adding useother=f (mentioned in the docs), like so:

..| timechart span=1m max(avgWaitMillis) as maxWait by host useother=f

I can't remember how specific the useother boolean needs to be, but you can also try useother=false, or the binary equivalent (e.g. "1" OR "0").

0 Karma

Builder

Thank you for your effort to help, never the less!

0 Karma

Builder

I'm afraid this doesn't do what I want at all.

That will just show the max values of avgWaitMillis, without even mentioning the host.

I want to know which host had the highest latency, not what the highest latency was.

Doing the same by host doesn't help me either, for out of the hundred or so hosts, the majority will be lumped into OTHER. So knowing that one host of 90 in OTHER had the highest latency at 21:15 and 23:30 reveals nothing.

0 Karma