Monitoring Splunk

Why is IOWait red after upgrade?

PickleRick
SplunkTrust
SplunkTrust

I did a partial upgrade of one of my environments (upgraded all components except for indexers at the moment due to time constraints).

And suddenly the status is showing IOWait as red. Similar to https://community.splunk.com/t5/Splunk-Enterprise/Why-is-the-health-status-of-IOWait-red/m-p/565902#...

Anyone knows if it's any known issue/bug? Or shall I tell the customer to fill a case with the support?

The servers in question are really doing... not much.

One of the servers in question is a master node. Supposedly getting killed by IOWait whereas top shows...

top - 13:12:37 up 210 days, 1:04, 1 user, load average: 0.35, 0.29, 0.28
Tasks: 255 total, 1 running, 254 sleeping, 0 stopped, 0 zombie
%Cpu0 : 4.0 us, 1.3 sy, 0.3 ni, 94.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 0.3 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu8 : 0.3 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu9 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu10 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu11 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu12 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu13 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu14 : 0.3 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu15 : 0.7 us, 0.0 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 65958320 total, 4973352 used, 60984968 free, 48540 buffers
KiB Swap: 4194300 total, 0 used, 4194300 free. 2479532 cached Mem

Other two are search-heads. Again - top output:

top - 13:13:08 up 174 days, 23:12, 1 user, load average: 5.91, 6.91, 5.82
Tasks: 456 total, 2 running, 454 sleeping, 0 stopped, 0 zombie
%Cpu0 : 19.3 us, 5.0 sy, 0.0 ni, 73.7 id, 0.0 wa, 0.0 hi, 2.0 si, 0.0 st
%Cpu1 : 4.4 us, 7.7 sy, 0.0 ni, 87.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 5.1 us, 6.8 sy, 0.0 ni, 88.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 5.8 us, 5.8 sy, 0.0 ni, 88.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 6.9 us, 3.4 sy, 0.0 ni, 89.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 4.6 us, 6.0 sy, 0.0 ni, 86.4 id, 0.0 wa, 0.0 hi, 3.0 si, 0.0 st
%Cpu6 : 3.8 us, 3.8 sy, 0.0 ni, 92.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 10.6 us, 3.8 sy, 0.0 ni, 85.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu8 : 6.1 us, 5.8 sy, 0.0 ni, 88.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu9 : 4.7 us, 4.4 sy, 0.0 ni, 90.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu10 : 3.9 us, 4.6 sy, 0.0 ni, 88.8 id, 0.0 wa, 0.0 hi, 2.6 si, 0.0 st
%Cpu11 : 4.4 us, 5.1 sy, 0.0 ni, 90.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu12 : 6.4 us, 5.4 sy, 0.0 ni, 87.0 id, 0.0 wa, 0.0 hi, 1.3 si, 0.0 st
%Cpu13 : 9.5 us, 2.7 sy, 0.0 ni, 86.8 id, 0.0 wa, 0.0 hi, 1.0 si, 0.0 st
%Cpu14 : 4.7 us, 5.4 sy, 0.0 ni, 89.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu15 : 9.4 us, 4.0 sy, 0.0 ni, 86.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu16 : 5.1 us, 5.8 sy, 0.0 ni, 89.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu17 : 3.8 us, 6.2 sy, 0.0 ni, 90.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu18 : 7.2 us, 3.9 sy, 0.0 ni, 85.2 id, 0.0 wa, 0.0 hi, 3.6 si, 0.0 st
%Cpu19 : 3.1 us, 4.8 sy, 0.0 ni, 92.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu20 : 5.5 us, 5.9 sy, 0.0 ni, 88.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu21 : 7.6 us, 5.5 sy, 0.0 ni, 86.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu22 : 5.5 us, 5.9 sy, 0.0 ni, 88.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu23 : 5.7 us, 6.4 sy, 0.0 ni, 87.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu24 : 5.8 us, 4.8 sy, 0.0 ni, 89.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu25 : 4.5 us, 5.9 sy, 0.0 ni, 89.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu26 : 5.0 us, 7.4 sy, 0.0 ni, 87.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu27 : 4.7 us, 4.7 sy, 0.0 ni, 90.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu28 : 6.1 us, 5.1 sy, 0.0 ni, 88.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu29 : 5.7 us, 6.4 sy, 0.0 ni, 87.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu30 : 8.8 us, 5.4 sy, 0.0 ni, 85.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu31 : 8.9 us, 4.4 sy, 0.0 ni, 86.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 65938200 total, 9247920 used, 56690280 free, 15468 buffers
KiB Swap: 4194300 total, 0 used, 4194300 free. 1184380 cached Mem

 As you can see - the servers are mostly idling, the search heads do some work, but not much.

To make things even more interesting, three other SHs dedicated to ES stressed way more than this SH-cluster, don't report IOWait problems.

All I did was migrate kvstore to WiredTiger and upgraded splunk from 8.1.2 to 8.2.6. That's all.

Labels (2)
Tags (1)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

That's a known issue.  The IOWait check is hyper-sensitive.  I advise customers to ignore that warning.

---
If this reply helps you, Karma would be appreciated.

View solution in original post

richgalloway
SplunkTrust
SplunkTrust

That's a known issue.  The IOWait check is hyper-sensitive.  I advise customers to ignore that warning.

---
If this reply helps you, Karma would be appreciated.

youngsuh
Contributor

is Splunk going to fix the false positive?  the solution is just ignore?  Is there JIRA open?

0 Karma

PickleRick
SplunkTrust
SplunkTrust

I was a bit surprised since it behaved relatively OK in previous version. It started showing those false positives after upgrade.  Will have to numb this indicator if it starts getting too annoying (telling the customer to ignore the red status indicator altogether is not the best possible idea ;-))

Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...