I did a partial upgrade of one of my environments (upgraded all components except for indexers at the moment due to time constraints).
And suddenly the status is showing IOWait as red. Similar to h...
See more...
I did a partial upgrade of one of my environments (upgraded all components except for indexers at the moment due to time constraints).
And suddenly the status is showing IOWait as red. Similar to https://community.splunk.com/t5/Splunk-Enterprise/Why-is-the-health-status-of-IOWait-red/m-p/565902#M9870
Anyone knows if it's any known issue/bug? Or shall I tell the customer to fill a case with the support?
The servers in question are really doing... not much.
One of the servers in question is a master node. Supposedly getting killed by IOWait whereas top shows...
top - 13:12:37 up 210 days, 1:04, 1 user, load average: 0.35, 0.29, 0.28 Tasks: 255 total, 1 running, 254 sleeping, 0 stopped, 0 zombie %Cpu0 : 4.0 us, 1.3 sy, 0.3 ni, 94.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu1 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu2 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu3 : 0.3 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu4 : 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu8 : 0.3 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu9 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu10 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu11 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu12 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu13 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu14 : 0.3 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu15 : 0.7 us, 0.0 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 65958320 total, 4973352 used, 60984968 free, 48540 buffers KiB Swap: 4194300 total, 0 used, 4194300 free. 2479532 cached Mem
Other two are search-heads. Again - top output:
top - 13:13:08 up 174 days, 23:12, 1 user, load average: 5.91, 6.91, 5.82 Tasks: 456 total, 2 running, 454 sleeping, 0 stopped, 0 zombie %Cpu0 : 19.3 us, 5.0 sy, 0.0 ni, 73.7 id, 0.0 wa, 0.0 hi, 2.0 si, 0.0 st %Cpu1 : 4.4 us, 7.7 sy, 0.0 ni, 87.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu2 : 5.1 us, 6.8 sy, 0.0 ni, 88.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu3 : 5.8 us, 5.8 sy, 0.0 ni, 88.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu4 : 6.9 us, 3.4 sy, 0.0 ni, 89.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu5 : 4.6 us, 6.0 sy, 0.0 ni, 86.4 id, 0.0 wa, 0.0 hi, 3.0 si, 0.0 st %Cpu6 : 3.8 us, 3.8 sy, 0.0 ni, 92.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu7 : 10.6 us, 3.8 sy, 0.0 ni, 85.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu8 : 6.1 us, 5.8 sy, 0.0 ni, 88.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu9 : 4.7 us, 4.4 sy, 0.0 ni, 90.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu10 : 3.9 us, 4.6 sy, 0.0 ni, 88.8 id, 0.0 wa, 0.0 hi, 2.6 si, 0.0 st %Cpu11 : 4.4 us, 5.1 sy, 0.0 ni, 90.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu12 : 6.4 us, 5.4 sy, 0.0 ni, 87.0 id, 0.0 wa, 0.0 hi, 1.3 si, 0.0 st %Cpu13 : 9.5 us, 2.7 sy, 0.0 ni, 86.8 id, 0.0 wa, 0.0 hi, 1.0 si, 0.0 st %Cpu14 : 4.7 us, 5.4 sy, 0.0 ni, 89.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu15 : 9.4 us, 4.0 sy, 0.0 ni, 86.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu16 : 5.1 us, 5.8 sy, 0.0 ni, 89.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu17 : 3.8 us, 6.2 sy, 0.0 ni, 90.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu18 : 7.2 us, 3.9 sy, 0.0 ni, 85.2 id, 0.0 wa, 0.0 hi, 3.6 si, 0.0 st %Cpu19 : 3.1 us, 4.8 sy, 0.0 ni, 92.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu20 : 5.5 us, 5.9 sy, 0.0 ni, 88.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu21 : 7.6 us, 5.5 sy, 0.0 ni, 86.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu22 : 5.5 us, 5.9 sy, 0.0 ni, 88.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu23 : 5.7 us, 6.4 sy, 0.0 ni, 87.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu24 : 5.8 us, 4.8 sy, 0.0 ni, 89.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu25 : 4.5 us, 5.9 sy, 0.0 ni, 89.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu26 : 5.0 us, 7.4 sy, 0.0 ni, 87.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu27 : 4.7 us, 4.7 sy, 0.0 ni, 90.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu28 : 6.1 us, 5.1 sy, 0.0 ni, 88.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu29 : 5.7 us, 6.4 sy, 0.0 ni, 87.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu30 : 8.8 us, 5.4 sy, 0.0 ni, 85.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu31 : 8.9 us, 4.4 sy, 0.0 ni, 86.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 65938200 total, 9247920 used, 56690280 free, 15468 buffers KiB Swap: 4194300 total, 0 used, 4194300 free. 1184380 cached Mem
As you can see - the servers are mostly idling, the search heads do some work, but not much.
To make things even more interesting, three other SHs dedicated to ES stressed way more than this SH-cluster, don't report IOWait problems.
All I did was migrate kvstore to WiredTiger and upgraded splunk from 8.1.2 to 8.2.6. That's all.