I am not sure if anyone else has encountered this, but in our distributed environment that was just upgraded from 8.0.3 to 8.2.2, we have noticed issues with the health report manager. The new IOWait feature in the health report is extremely "chatty" even though all other aspects of the deployment are in great shape. Even though we can successfully disable the IOWait feature in the console and via a local health.conf file, the feature is still being included in the health report. I've opened up a case with Splunk support, but was just wondering if anyone else has encountered this behavior.
Splunk have now updated their documentation regarding disable health report features.
It states in a box:
If distributed health reporting is enabled for your deployment, disabling a feature on the local instance will not be reflected in the health report.
It seems, the workaround to disable a feature in +8.2 has just became a feature. The old behavior in +8.1 in which you could disable a single feature regardless of distributed health report has been "improved"
If anyone submits an Idea to bring the old behavior back, let me know. You get my vote.
Thanks to you all for reporting this. We experienced the same issue in our upgrade from 8.0.3 to 126.96.36.199. Our Splunk support contact confirmed they're aware of it, and for now we're hoping for a expeditious resolution rather than disabling reporting on it.
Question for @nunoaragao, is there a public page where we can track the status of SPL-213405? I was hoping it was a public bug report, but not being able to find it with Google, I gather that's the ticket number for the case you opened with support?
Hi @samjenk_2 , my case(s) with Splunk Support were #2733102 and #2737559 .. SPL-213405 is Splunk's internal JIRA to track this issue. It may, or not, then show up on Splunk's release notes as known issue. It's still being investigated. If you deal with Support you can ask to link with it.
My issue is that Docs say "You can disable any feature (...) for example, if you want to exclude a feature's status from the health report". So we expect to be able to disable a specific feature (i.e. Buckets) without requiring to disable distributed_health_reporter, which would also disable/hide a lot of other features if we're on a typical topology where we have search head clusters and clustered indexers. In other words, tell the Search Head to gray out a Indexer peer feature even if that peer is reporting health.
It matches your scenario ?
Thanks for the ticket references, @nunoaragao. I just heard from our Splunk administrator that Splunk has gotten back to him on this matter. With the caveat that your Splunk deployment (and the deployment of others) may be different from ours, they told us that the IOWait thresholds were simply set too aggressively, and that we can either disable the health report like others are doing, or tune the IOWait thresholds in health.conf, as described here:
Disappointingly, it looks like the example health.conf file in Splunk's online documentation  doesn't include a description of the iowait feature stanza, but it's reasonably documented in the file $SPLUNK/etc/system/default/health.conf. There are several different metrics tracked in this feature, and in our file, they all have to do with CPU utilization (I would have thought disc I/O).
Personally, I'm inclined to monitor CPU utilization with other tools to work out what an typical load is like for these metrics, and adjust the thresholds accordingly. Or maybe disable the IOWait feature, but leave the rest of the health features enabled.
Hi @samjenk_2 ,
These are the events used to toggle health colours in Splunk's Health Report.
So we get to know what are the typical values, and in which servers.
index=_introspection sourcetype=splunk_resource_usage component=IOWait
I also found this issue on 188.8.131.52, and Splunk is tracking this on SPL-213405.
My issue is that I disable IOWait and Buckets health check on the Search Heads, but as long as search peers report issues with these features, they'll still show up. Except if we disable distributed health report, but then we lose visibility on a lot of other health checks.
And apparently, the documentation now reports that distributed health report is enabled by default.
I noticed the same thing upon upgrading to 8.2.2 with IOWait.
I had to disable the IOWait health check on the Search Head and also on the Indexers, but it sounds like disabling the distributed health report would be a better solution.
Just in case anyone else experiences this issue, what I did to resolve it was disable the distributed health report which is enabled by default in 8.2. Once done, I was able to successfully disable the feature in the health report manager.
Yes, that is what I was talking about. Even though the documentation indicates that the distributed report is disabled by default, I noticed in my environment that after upgrading from 8.0.3 to 8.2.2, the feature was enabled everywhere.
looks like documentation has been updated : https://docs.splunk.com/Documentation/Splunk/8.2.2/DMC/Aboutfeaturemonitoring
"By default, the distributed health report is enabled (set to disabled = 0) in health.conf"