We're debugging an issue where disk latency shoots up at a specific time. I would like to create a search which shows the host with the highest latency at any specific minute.
So the base search is:
index=os sourcetype=iostat | multikv fields avgWaitMillis
...but then I'm not sure how to continue... I would like to find every host where avgWaitMillis is the highest for every minute.
I think you may want to pipe to the
timechart command, which will allow you gain stats over time. You may be able to do something like:
..| timechart span=1m max(avgWaitMillis) as maxWait
I haven't used a split-by cause (don't think you'll need one), but if you need one, just add something like, "
by someField" (where someField is a unique split-by-cause you have).
Please see documentation:
To elaborate a bit on that sample table:
At time n, the
avgWaitMillis of host001 equals
max(avgWaitMillis) of all hosts (at that time).
Likewise, at time l, the
avgWaitMillis of host219 ==
max(avgWaitMillis) of all hosts at that time.
Thanks, but that is still not what I'm after.
useother only affects the grouping of the hosts in the chart.
timechart is really not the answer here, since I'm not concerned about the values themselves, but which hosts had the max value at a particular time.
Since I'm primarily interested in the hostnames, a chart is probably not the best visualization, but rather a table, with values about like this:
time , host_with_highest_latency time n, host001.domain.com time m, host321.domain.com time l, host219.domain.com
Have you tried adding
useother=f (mentioned in the docs), like so:
..| timechart span=1m max(avgWaitMillis) as maxWait by host useother=f
I can't remember how specific the useother boolean needs to be, but you can also try
useother=false, or the binary equivalent (e.g. "1" OR "0").
I'm afraid this doesn't do what I want at all.
That will just show the max values of avgWaitMillis, without even mentioning the host.
I want to know which host had the highest latency, not what the highest latency was.
Doing the same
by host doesn't help me either, for out of the hundred or so hosts, the majority will be lumped into
OTHER. So knowing that one host of 90 in
OTHER had the highest latency at 21:15 and 23:30 reveals nothing.