Where do I configure the health.conf so that I can...

BlueSocket · ‎05-07-2022

Dear All,

I have a Search Head, Deployment Server, Monitoring Console, a Cluster Manager, an Indexer Cluster and two unclustered Indexers.

On the Monitoring Console, I get alerts about the IOWaits being high on the two unclustered indexers and this has been happening only since we upgraded to 8.2.5.

There is no evidence of any issues, other than this alert in SplunkWeb and I want to disable it. I am using the following KB article:

https://docs.splunk.com/Documentation/Splunk/8.2.5/Admin/Healthconf

On the Monitoring Console server, I have put the following into the etc\apps\search\local\health.conf file:

[feature:iowait]
alert:sum_top3_cpu_percs__max_last_3m.disabled = 1

However, I am still getting the appearing in SplunkWeb on the Monitoring Console server.

Why is this? Am I configuring the health.conf in the wrong server or the wrong folder, or what? When I run a cmd btool health list, I see the configuration there, but Splunk is not doing as it is being told! If I am doing the wrong thing, even, can someone point me to some documentation that explains what I should be doing?

Thanks in advance!

pellegrini · ‎09-06-2022

The Health feature has caused some confusion regards local vs. distributed config. I have investigated this and it is very flexible to configure even though the docs is not so clear about it. So far I have not found any Answers posts that isn't possible to solve using standard config.

If you do configuration locally on Monitoring Console (DMC), as you described, that threshold will only be valid for the DMC local host. There is no distributed threashold. You need to configure the threshold on each and every enterprise instance (e.g. your standalone indexers). Either you do config in Splunk Web under Settings menu on each enterprise instance and just click Save. Or toggle Status Disable/Enable. This will take direct effect and does not require restart.

If you instead configure health.conf on each instance, example disable iowait, put this in health.conf

[feature:iowait]
disabled = 1

And then you need to do a reload e.g. http://<your_splunk>:<splunk_port>/debug/refresh

If you have a index cluster, applying a cluster bundle, this will trigger a restart of the peers. Version 8.2.4

Hope this solves your issue.

pellegrini · ‎09-06-2022

In addition you can use these searches to benchmark iowait performance over time, so you can set relevant thresholds for your environment. Just replace the hostname:

CPU IOwait average

index=_internal  source="*/splunk/var/log/splunk/health.log"   
feature=IOWait component=PeriodicHealthReporter node_type=indicator 
indicator=avg_cpu__max_perc_last_3m host=ind0* 
| timechart span=30s max(measured_value) min(due_to_threshold_value) by host

CPU IOwait single CPU

index=_internal  source="*/splunk/var/log/splunk/health.log"   
feature=IOWait component=PeriodicHealthReporter node_type=indicator 
indicator=single_cpu__max_perc_last_3m host=ind0* 
| timechart span=300s max(measured_value) min(due_to_threshold_value) by host

CPU IOwait top3 CPU

index=_internal  source="*/splunk/var/log/splunk/health.log"  
feature=IOWait component=PeriodicHealthReporter node_type=indicator 
indicator=sum_top3_cpu_percs__max_last_3m host=ind0* 
| timechart span=30s max(measured_value) min(due_to_threshold_value) by host

richgalloway · ‎05-07-2022

Did you restart the MC server after changing the config file?

Have you tried making the same health.conf change on the indexers?

---
If this reply helps you, Karma would be appreciated.

BlueSocket · ‎05-10-2022

Yes, I restarted the MC several times after the changes to the configurations.

No, I have not edited the health.conf on the Indexers. They are quite difficult to restart at the moment and I was hoping that someone would have a KB or document that could help (or know definitively) before I went there.

Where do I configure the health.conf so that I can disable the IOWaits alert?

configuration

metrics

Learn Splunk Insider Insights, Do More With Gen AI, & Find 20+ New Use Cases You Can ...

Buttercup Games: Further Dashboarding Techniques (Part 7)

Stay Connected: Your Guide to April Tech Talks, Office Hours, and Webinars!