I did a partial upgrade of one of my environments (upgraded all components except for indexers at the moment due to time constraints).
And suddenly the status is showing IOWait as red. Similar to https://community.splunk.com/t5/Splunk-Enterprise/Why-is-the-health-status-of-IOWait-red/m-p/565902#...
Anyone knows if it's any known issue/bug? Or shall I tell the customer to fill a case with the support?
The servers in question are really doing... not much.
One of the servers in question is a master node. Supposedly getting killed by IOWait whereas top shows...
top - 13:12:37 up 210 days, 1:04, 1 user, load average: 0.35, 0.29, 0.28
Tasks: 255 total, 1 running, 254 sleeping, 0 stopped, 0 zombie
%Cpu0 : 4.0 us, 1.3 sy, 0.3 ni, 94.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 0.3 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu8 : 0.3 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu9 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu10 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu11 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu12 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu13 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu14 : 0.3 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu15 : 0.7 us, 0.0 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 65958320 total, 4973352 used, 60984968 free, 48540 buffers
KiB Swap: 4194300 total, 0 used, 4194300 free. 2479532 cached Mem
Other two are search-heads. Again - top output:
top - 13:13:08 up 174 days, 23:12, 1 user, load average: 5.91, 6.91, 5.82
Tasks: 456 total, 2 running, 454 sleeping, 0 stopped, 0 zombie
%Cpu0 : 19.3 us, 5.0 sy, 0.0 ni, 73.7 id, 0.0 wa, 0.0 hi, 2.0 si, 0.0 st
%Cpu1 : 4.4 us, 7.7 sy, 0.0 ni, 87.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 5.1 us, 6.8 sy, 0.0 ni, 88.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 5.8 us, 5.8 sy, 0.0 ni, 88.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 6.9 us, 3.4 sy, 0.0 ni, 89.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 4.6 us, 6.0 sy, 0.0 ni, 86.4 id, 0.0 wa, 0.0 hi, 3.0 si, 0.0 st
%Cpu6 : 3.8 us, 3.8 sy, 0.0 ni, 92.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 10.6 us, 3.8 sy, 0.0 ni, 85.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu8 : 6.1 us, 5.8 sy, 0.0 ni, 88.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu9 : 4.7 us, 4.4 sy, 0.0 ni, 90.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu10 : 3.9 us, 4.6 sy, 0.0 ni, 88.8 id, 0.0 wa, 0.0 hi, 2.6 si, 0.0 st
%Cpu11 : 4.4 us, 5.1 sy, 0.0 ni, 90.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu12 : 6.4 us, 5.4 sy, 0.0 ni, 87.0 id, 0.0 wa, 0.0 hi, 1.3 si, 0.0 st
%Cpu13 : 9.5 us, 2.7 sy, 0.0 ni, 86.8 id, 0.0 wa, 0.0 hi, 1.0 si, 0.0 st
%Cpu14 : 4.7 us, 5.4 sy, 0.0 ni, 89.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu15 : 9.4 us, 4.0 sy, 0.0 ni, 86.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu16 : 5.1 us, 5.8 sy, 0.0 ni, 89.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu17 : 3.8 us, 6.2 sy, 0.0 ni, 90.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu18 : 7.2 us, 3.9 sy, 0.0 ni, 85.2 id, 0.0 wa, 0.0 hi, 3.6 si, 0.0 st
%Cpu19 : 3.1 us, 4.8 sy, 0.0 ni, 92.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu20 : 5.5 us, 5.9 sy, 0.0 ni, 88.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu21 : 7.6 us, 5.5 sy, 0.0 ni, 86.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu22 : 5.5 us, 5.9 sy, 0.0 ni, 88.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu23 : 5.7 us, 6.4 sy, 0.0 ni, 87.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu24 : 5.8 us, 4.8 sy, 0.0 ni, 89.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu25 : 4.5 us, 5.9 sy, 0.0 ni, 89.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu26 : 5.0 us, 7.4 sy, 0.0 ni, 87.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu27 : 4.7 us, 4.7 sy, 0.0 ni, 90.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu28 : 6.1 us, 5.1 sy, 0.0 ni, 88.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu29 : 5.7 us, 6.4 sy, 0.0 ni, 87.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu30 : 8.8 us, 5.4 sy, 0.0 ni, 85.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu31 : 8.9 us, 4.4 sy, 0.0 ni, 86.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 65938200 total, 9247920 used, 56690280 free, 15468 buffers
KiB Swap: 4194300 total, 0 used, 4194300 free. 1184380 cached Mem
As you can see - the servers are mostly idling, the search heads do some work, but not much.
To make things even more interesting, three other SHs dedicated to ES stressed way more than this SH-cluster, don't report IOWait problems.
All I did was migrate kvstore to WiredTiger and upgraded splunk from 8.1.2 to 8.2.6. That's all.
That's a known issue. The IOWait check is hyper-sensitive. I advise customers to ignore that warning.
That's a known issue. The IOWait check is hyper-sensitive. I advise customers to ignore that warning.
is Splunk going to fix the false positive? the solution is just ignore? Is there JIRA open?
I was a bit surprised since it behaved relatively OK in previous version. It started showing those false positives after upgrade. Will have to numb this indicator if it starts getting too annoying (telling the customer to ignore the red status indicator altogether is not the best possible idea ;-))