Monitoring Splunk

Why is IOWait red after upgrade?

PickleRick
SplunkTrust
SplunkTrust

I did a partial upgrade of one of my environments (upgraded all components except for indexers at the moment due to time constraints).

And suddenly the status is showing IOWait as red. Similar to https://community.splunk.com/t5/Splunk-Enterprise/Why-is-the-health-status-of-IOWait-red/m-p/565902#...

Anyone knows if it's any known issue/bug? Or shall I tell the customer to fill a case with the support?

The servers in question are really doing... not much.

One of the servers in question is a master node. Supposedly getting killed by IOWait whereas top shows...

top - 13:12:37 up 210 days, 1:04, 1 user, load average: 0.35, 0.29, 0.28
Tasks: 255 total, 1 running, 254 sleeping, 0 stopped, 0 zombie
%Cpu0 : 4.0 us, 1.3 sy, 0.3 ni, 94.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 0.3 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu8 : 0.3 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu9 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu10 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu11 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu12 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu13 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu14 : 0.3 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu15 : 0.7 us, 0.0 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 65958320 total, 4973352 used, 60984968 free, 48540 buffers
KiB Swap: 4194300 total, 0 used, 4194300 free. 2479532 cached Mem

Other two are search-heads. Again - top output:

top - 13:13:08 up 174 days, 23:12, 1 user, load average: 5.91, 6.91, 5.82
Tasks: 456 total, 2 running, 454 sleeping, 0 stopped, 0 zombie
%Cpu0 : 19.3 us, 5.0 sy, 0.0 ni, 73.7 id, 0.0 wa, 0.0 hi, 2.0 si, 0.0 st
%Cpu1 : 4.4 us, 7.7 sy, 0.0 ni, 87.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 5.1 us, 6.8 sy, 0.0 ni, 88.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 5.8 us, 5.8 sy, 0.0 ni, 88.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 6.9 us, 3.4 sy, 0.0 ni, 89.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 4.6 us, 6.0 sy, 0.0 ni, 86.4 id, 0.0 wa, 0.0 hi, 3.0 si, 0.0 st
%Cpu6 : 3.8 us, 3.8 sy, 0.0 ni, 92.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 10.6 us, 3.8 sy, 0.0 ni, 85.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu8 : 6.1 us, 5.8 sy, 0.0 ni, 88.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu9 : 4.7 us, 4.4 sy, 0.0 ni, 90.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu10 : 3.9 us, 4.6 sy, 0.0 ni, 88.8 id, 0.0 wa, 0.0 hi, 2.6 si, 0.0 st
%Cpu11 : 4.4 us, 5.1 sy, 0.0 ni, 90.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu12 : 6.4 us, 5.4 sy, 0.0 ni, 87.0 id, 0.0 wa, 0.0 hi, 1.3 si, 0.0 st
%Cpu13 : 9.5 us, 2.7 sy, 0.0 ni, 86.8 id, 0.0 wa, 0.0 hi, 1.0 si, 0.0 st
%Cpu14 : 4.7 us, 5.4 sy, 0.0 ni, 89.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu15 : 9.4 us, 4.0 sy, 0.0 ni, 86.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu16 : 5.1 us, 5.8 sy, 0.0 ni, 89.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu17 : 3.8 us, 6.2 sy, 0.0 ni, 90.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu18 : 7.2 us, 3.9 sy, 0.0 ni, 85.2 id, 0.0 wa, 0.0 hi, 3.6 si, 0.0 st
%Cpu19 : 3.1 us, 4.8 sy, 0.0 ni, 92.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu20 : 5.5 us, 5.9 sy, 0.0 ni, 88.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu21 : 7.6 us, 5.5 sy, 0.0 ni, 86.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu22 : 5.5 us, 5.9 sy, 0.0 ni, 88.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu23 : 5.7 us, 6.4 sy, 0.0 ni, 87.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu24 : 5.8 us, 4.8 sy, 0.0 ni, 89.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu25 : 4.5 us, 5.9 sy, 0.0 ni, 89.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu26 : 5.0 us, 7.4 sy, 0.0 ni, 87.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu27 : 4.7 us, 4.7 sy, 0.0 ni, 90.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu28 : 6.1 us, 5.1 sy, 0.0 ni, 88.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu29 : 5.7 us, 6.4 sy, 0.0 ni, 87.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu30 : 8.8 us, 5.4 sy, 0.0 ni, 85.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu31 : 8.9 us, 4.4 sy, 0.0 ni, 86.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 65938200 total, 9247920 used, 56690280 free, 15468 buffers
KiB Swap: 4194300 total, 0 used, 4194300 free. 1184380 cached Mem

 As you can see - the servers are mostly idling, the search heads do some work, but not much.

To make things even more interesting, three other SHs dedicated to ES stressed way more than this SH-cluster, don't report IOWait problems.

All I did was migrate kvstore to WiredTiger and upgraded splunk from 8.1.2 to 8.2.6. That's all.

Labels (1)
Tags (1)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

That's a known issue.  The IOWait check is hyper-sensitive.  I advise customers to ignore that warning.

---
If this reply helps you, Karma would be appreciated.

View solution in original post

richgalloway
SplunkTrust
SplunkTrust

That's a known issue.  The IOWait check is hyper-sensitive.  I advise customers to ignore that warning.

---
If this reply helps you, Karma would be appreciated.

youngsuh
Contributor

is Splunk going to fix the false positive?  the solution is just ignore?  Is there JIRA open?

0 Karma

PickleRick
SplunkTrust
SplunkTrust

I was a bit surprised since it behaved relatively OK in previous version. It started showing those false positives after upgrade.  Will have to numb this indicator if it starts getting too annoying (telling the customer to ignore the red status indicator altogether is not the best possible idea ;-))

Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...