Deployment Architecture

Search Head Cluster Captain not transferring on captain failure - max thread limit for REST HTTP Server

sail4lot
Path Finder

Hi All-

We have a problem where our SHC Captain seems to stop responding. In looking at netstat and the splunkd logs there are a bunch of CLOSE_WAIT connections that just persist for netstat. In addition in the logs, I see a bunch of errors such as "HttpListener ..... max thread limit for REST HTTP server is 5333."

So, the captain fails to respond to requests and then the cluster just stops working all together. I would have thought the other members would automatically determine a new captain (we have 5 hosts in total for SHC). To remedy I end up having to totally reboot the captain. This brings things back but obviously we want this setup to be more resilient.

I am hesitant to up the server.conf limits for threads because it seems like it will just continue growing. Ideally, we'd see the failure occur on the captain, and it would just get transferred to another host.

Does anyone have any troubleshooting suggestions or remedies?

Thanks!

Labels (2)
0 Karma

jfrazier060803
Loves-to-Learn Lots

Did you ever find a solution to this problem? We are having the exact same issue and we haven't been able to figure it out. We are currently on version 9.0 (we've had the issue well before upgrading to 9 though). We are connected to about 6 or 7 different Index Clusters in remote locations and the network reliability for one of them isn't great. We have (for the most part) correlated the captain disconnecting with this one flakey index cluster going down.

I added the : 

[httpServer]
maxThreads = -1 

to all my search heads this morning. I will see if this helps and if it does, I'll let you know. Could you provide any more details about your setup? I'm interested to see if it's similar to ours. Look forward to hearing from you. 

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...