Hi,
We have a cluster of 6 Search heads in our environment. Recently we upgraded the Splunk to 6.5.3. on 12th April. Post the upgrade, the memory utilization, CPU utilization and Disk utilization of 5 out of 6 Search heads is not showing up on our dashboards.
Please find the screenshot below for the memory usage trend of our search heads.
It can be observed that memory utilization trend of 5/6 search heads is missing after 12th April. Same is the issue with CPU and Disk utilization trends. Kindly help me out in resolving this.
Can you post the SPL for this dashboard? Also, can you check the _internal indexes on the missing SH's for any errors regarding introspection?
Are you getting _internal data from those SH's still?
Edit: Answered in the comments below
Hi brreeves,
Thank you for a quick response.
SPL for Memory Utilization:
index="_introspection" host="lelnx*" component=hostwide
| lookup ESsearch_lookup host OUTPUT alias
| eval Memory_Used=round(('data.mem_used'/1024),2)
| timechart span=1h avg(Memory_Used) by alias usenull=f limit=0
Here, I am using a lookup to convert host names into their aliases.
No, we are not getting internal logs from the missing SHs.
Looks like listening ports have been shut down and also some issues with mongod/KV store.
So, one issue at at a time...let's get the SH's sending data to the indexers into the _introspection index. You ok to fix those listening ports? And can you handle the KVStore errors too?
Will restarting splunk on SH help?
If not, kindly help me understand how to fix listening ports and handle KV Store errors.
It would re-open the listening ports on the SH's, if those are the ports you were talking about.
I need to know WHICH listening ports are closed, and what the KVStore errors are 😉
Hey Brreeves,
The issue got resolved. It seems that the etc/system/local was copied from on of the SH to all of the Search Heads, this created an identity crisis as it changed the names of the ES Search Heads internally. I fixed that manually, and refreshed the configuration on the DMC.
But if I remember correctly, we did not make any configuration changes recently. Can this be because of the 6.5.3. upgrade?
Lot of Thanks for helping me out.
Copying the etc/system/local folder is not a standard upgrade process, nor is it facilitated by our installers.
Glad you got it all figured out!