Running latest 5.x on my search head and have noticed lately that more and more users are randomly getting a "500 Internal Server Error" when trying to access or edit saved searches.
What would cause this and how do I fix the issue so that users do not get the 500 error?
Most likely this is due to the instance being busy and responding to the rest endpoint query is taking longer than the default time that Splunkweb waits for a response, which is 30 seconds by default.
This can happen more frequently on search heads that service many users or have a large number of scheduled searches that are running in the background with a large dispatch directory.
Looking in the web_service.log will show results like these:
2013-09-25 18:52:11,076 ERROR [5243921ef67fd8b01eae10] search:227 - Splunkd daemon is not responding: ('The read operation timed out',)
Traceback (most recent call last):
File "/opt/splunk/lib/python2.7/site-packages/splunk/appserver/mrsparkle/controllers/search.py", line 224, in dispatchJob
job = splunk.search.dispatch(q, sessionKey=cherrypy.session['sessionKey'], **options)
File "/opt/splunk/lib/python2.7/site-packages/splunk/search/__init__.py", line 268, in dispatch
serverResponse, serverContent = rest.simpleRequest(uri, postargs=args, sessionKey=sessionKey, rawResult=True)
File "/opt/splunk/lib/python2.7/site-packages/splunk/rest/__init__.py", line 443, in simpleRequest
raise splunk.SplunkdConnectionException, str(e)
SplunkdConnectionException: Splunkd daemon is not responding: ('The read operation timed out',)
There is also a helpful view in the Splunk S.O.S. app called "HTTP Response Times For splunkd" that will show more detail on response times and what is being accessed:
Every time that the response is longer that 30 seconds, you will get a 500 error when trying to access that object.
To allow Splunkweb to wait for a longer period of time that the default 30 seconds, you can edit the
$Splunk_home/lib/python2.7/site-packages/splunk/rest/__init__.py
file and change the value of the following line:
SPLUNKD_CONNECTION_TIMEOUT = 30
to a value that is suitable for your response times. In the above case, changing this value to 50 or 60 would probably work since we had a few instances where it took 30 - 40 seconds. Once this is edited and saved, you will need to restart the Splunk instance in order for the change to take effect and then you should no longer get the 500 errors as before.
Most likely this is due to the instance being busy and responding to the rest endpoint query is taking longer than the default time that Splunkweb waits for a response, which is 30 seconds by default.
This can happen more frequently on search heads that service many users or have a large number of scheduled searches that are running in the background with a large dispatch directory.
Looking in the web_service.log will show results like these:
2013-09-25 18:52:11,076 ERROR [5243921ef67fd8b01eae10] search:227 - Splunkd daemon is not responding: ('The read operation timed out',)
Traceback (most recent call last):
File "/opt/splunk/lib/python2.7/site-packages/splunk/appserver/mrsparkle/controllers/search.py", line 224, in dispatchJob
job = splunk.search.dispatch(q, sessionKey=cherrypy.session['sessionKey'], **options)
File "/opt/splunk/lib/python2.7/site-packages/splunk/search/__init__.py", line 268, in dispatch
serverResponse, serverContent = rest.simpleRequest(uri, postargs=args, sessionKey=sessionKey, rawResult=True)
File "/opt/splunk/lib/python2.7/site-packages/splunk/rest/__init__.py", line 443, in simpleRequest
raise splunk.SplunkdConnectionException, str(e)
SplunkdConnectionException: Splunkd daemon is not responding: ('The read operation timed out',)
There is also a helpful view in the Splunk S.O.S. app called "HTTP Response Times For splunkd" that will show more detail on response times and what is being accessed:
Every time that the response is longer that 30 seconds, you will get a 500 error when trying to access that object.
To allow Splunkweb to wait for a longer period of time that the default 30 seconds, you can edit the
$Splunk_home/lib/python2.7/site-packages/splunk/rest/__init__.py
file and change the value of the following line:
SPLUNKD_CONNECTION_TIMEOUT = 30
to a value that is suitable for your response times. In the above case, changing this value to 50 or 60 would probably work since we had a few instances where it took 30 - 40 seconds. Once this is edited and saved, you will need to restart the Splunk instance in order for the change to take effect and then you should no longer get the 500 errors as before.