Hello,
I recently joined a project at a place with an ES Search head and a few ad-hoc ones. Apparently, there has been a new issue that started a few days before I got here. The ES SH (others are fine) will go unresponsive for web browsing until it is restarted. The web daemon and splunkd are still running. Alerts and searches are still running.
Splunkd is just bombarded with "ERROR HttpClientRequest - HTTP client error=Read Timeout while connecting to server"
messages and pretty much nothing else. I'm trying to help them figure out what the issue is, but I've never seen something like this happen.
Any thoughts?
Hi did you find out what caused this error?
How does the system resource usage on your search heads look? And is there a large number of concurrent searches? You may want to take a look at the "Search Activity: Instance" dashboard in the Monitoring Console to see if there are searches running using a large amount of memory.
That was my first thought as well, but I can't access anything on the SH when it's like that. Looking at the charts historically for when it happens, though, I don't see any spikes.
if the ES search head forwards its data to indexer layer, you supposed to be able to query its status through another search head: index = _internal host=ES_Searhc_Head