I upgraded from 7.2 to 8.0 and then 8.0 to 8.2
After the upgrade to our distributed deployment, I am getting bombarded with email Health Alerts.
"sum_top3_cpu_percs__max_last_3m" is red due to the following: "Sum of 3 highest per-cpu iowaits reached red threshold of 15"
"avg_cpu__max_perc_last_3m" is red due to the following: "System iowait reached red threshold of 3"
"single_cpu__max_perc_last_3m" is red due to the following: "Maximum per-cpu iowait reached red threshold of 10"
I was getting them on my Indexers yesterday but this morning it seems to be our Enterprise Security SH, our Deployment Server, and our regular Search Head.
I am unable to disable these alerts due to our Company's policy.
What can I do to either a.) resolve this cpu/iowait issue or b.) change the alert settings?
I don't notice a difference in performance. I'm just curious as to what's causing this CPU usage spike?
Because it seems to me - as in the example of avg cpu max percent if the CPU usage is above 3%, it is going to alert me?
You can change the thresholds on each enterprise instance. Most of what is described here is locally configured on each instance. See Answers https://community.splunk.com/t5/Splunk-Enterprise/Where-do-I-configure-the-health-conf-so-that-I-can...
You can change them via health.conf https://docs.splunk.com/Documentation/Splunk/latest/Admin/Healthconf
I think many have found the iowait check too sensitive in 8.2...including myself
For more details on this issue, go to the following Splunk Answer: https://community.splunk.com/t5/Splunk-Enterprise/Cannot-Disable-Health-Report-Features-in-8-2-2/m-p...
Splunk have now updated their documentation regarding disable health report features.
It states in a box:
"If distributed health reporting is enabled for your deployment, disabling a feature on the local instance will not be reflected in the health report."
It seems, the workaround to disable a feature in +8.2 has just became a feature. The old behavior in +8.1 in which you could disable a single feature regardless of distributed health report has been "improved"/
My case(s) with Splunk Support were #2733102 and #2737559 .. SPL-213405 is Splunk's internal JIRA to track this issue. It may, or not, then show up on Splunk's release notes as known issue. It's still being investigated. If you deal with Support you can ask to link with it.
My issue is that Docs say "You can disable any feature (...) for example, if you want to exclude a feature's status from the health report". So we expect to be able to disable a specific feature (i.e. Buckets) without requiring to disable distributed_health_reporter, which would also disable/hide a lot of other features if we're on a typical topology where we have search head clusters and clustered indexers. In other words, tell the Search Head to gray out a Indexer peer feature even if that peer is reporting health.