500 Internal Server Error after upgrading to 6.3.x...

szabados · ‎11-30-2015

After upgrading my Splunk cluster to 6.3.1, I'm facing "500 Internal Server Error" all the time, after logging in to any of my Splunk instances (search head, deployment server ...)

My splunkd.log is full with lines like this:
WARN HttpListener - Can't handle request for /services/broker/connect/<peer id>, max thread limit for REST HTTP server is 2729, threads already in use is 2729

My web_service.log ends with things like this when the issue happens:

2015-11-30 08:21:49,790 DEBUG [565c071d9a4b27ac7a58] cplogging:55 - [30/Nov/2015:08:21:49] HTTP Traceback (most recent call last):
File "E:\Splunk\Python-2.7\Lib\site-packages\cherrypy_cprequest.py", line 606, in respond
cherrypy.response.body = self.handler()
File "E:\Splunk\Python-2.7\Lib\site-packages\cherrypy_cpdispatch.py", line 25, in __call_
return self.callable(*self.args, **self.kwargs)
File "<string>", line 1, in <lambda>
File "E:\Splunk\Python-2.7\Lib\site-packages\splunk\appserver\mrsparkle\lib\decorators.py", line 38, in rundecs
return fn(*a, **kw)
File "<string>", line 1, in <lambda>
File "E:\Splunk\Python-2.7\Lib\site-packages\splunk\appserver\mrsparkle\lib\decorators.py", line 118, in check
return fn(self, *a, **kw)
File "<string>", line 1, in <lambda>
File "E:\Splunk\Python-2.7\Lib\site-packages\splunk\appserver\mrsparkle\lib\decorators.py", line 167, in validate_ip
return fn(self, *a, **kw)
File "<string>", line 1, in <lambda>
File "E:\Splunk\Python-2.7\Lib\site-packages\splunk\appserver\mrsparkle\lib\decorators.py", line 246, in preform_sso_check
update_session_user(sessionKey, remote_user)
File "E:\Splunk\Python-2.7\Lib\site-packages\splunk\appserver\mrsparkle\lib\decorators.py", line 189, in update_session_user
en = splunk.entity.getEntity('authentication/users', user, sessionKey=sessionKey)
File "E:\Splunk\Python-2.7\Lib\site-packages\splunk\entity.py", line 249, in getEntity
serverResponse, serverContent = rest.simpleRequest(uri, getargs=kwargs, sessionKey=sessionKey, raiseAllErrors=True)
File "E:\Splunk\Python-2.7\Lib\site-packages\splunk\rest_init_.py", line 567, in simpleRequest
raise splunk.RESTException, (serverResponse.status, serverResponse.messages)
RESTException: [HTTP 503] General server error

stevepraz · ‎11-30-2015

What OS are you running on? We are having a similar problem on Windows virtual servers running Splunk following our upgrade to 6.3. 6.3.1 didn't seem to help either. Recycling Splunk solves the issue temporarily but it comes back. We first had the issue on our Windows search head, once we rolled back to a 6.2 release the issue went away. Now we are also seeing the issue our on indexers that are 6.3 but rolling back isn't really an option there. Our Linux search head hasn't seen the issue.

The system appears to be running into it's own self-imposed max HTTP threads limit. When the server is in this condition it returns 500 errors and eventually fails health checks. However, logging onto the server shows that actual CPU, memory and other system vitals barely being used at all.

My first thought was that by overriding maxThreads in server.conf we could escape the issue but that doesn't solve it either. I'm thinking maybe something in 6.3 is either using more of these threads or not cleaning them up as fast. The problem is I don't see any way to measure it other than the messages that come up when you've hit the limit.

szabados · ‎12-04-2015

Hi,

I've tried the same, and it seems it solved the issue for me.
Just a tip: there is a Splunk article about this, where the stanza name is with lowercase S in [httpserver].
In the server.conf spec, it is written with capital S [httpServer]. I've copied it for the first time with the lowercase S, and it didn't work, but after correcting it to uppercase, it solved the issue.

stevepraz · ‎12-04-2015

Thanks for the update. I stumbled into the same copy/paste issue you mentioned but later realized it. That fix appeared to approve my uptime but recycles still were required, just less frequently.

Since I updated this, I got more information on my case I originally opened and my sales engineer mentioned that there is definitely a bug opened for this specific issue that will be addressed in a future release.

My resolution was to migrate my indexers to Linux to restore stability to my environment.

mzorzi · ‎11-30-2015

You might just have a problem of resources. If your system is indeed matching the recommended then you should try again the upgrade, but first stop Splunk

500 Internal Server Error after upgrading to 6.3.x ?

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Join the Conversation

500 Internal Server Error after upgrading to 6.3.x ?

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits