Monitoring Splunk

Splunk Daemon Not Responding

alacercogitatus
SplunkTrust
SplunkTrust

Greetings! So, we are running 5.0.3 in SHP (2 SHs) with SSO=permissive. I get this error:


2013-06-06 16:06:41,656 ERROR [51b0ebb39d7fb184803e90] search:221 - Splunkd daemon is not responding: ('The read operation timed out',)
Traceback (most recent call last):
File "/opt/splunk/lib/python2.7/site-packages/splunk/appserver/mrsparkle/controllers/search.py", line 218, in dispatchJob
job = splunk.search.dispatch(q, sessionKey=cherrypy.session['sessionKey'], **options)
File "/opt/splunk/lib/python2.7/site-packages/splunk/search/init.py", line 268, in dispatch
serverResponse, serverContent = rest.simpleRequest(uri, postargs=args, sessionKey=sessionKey, rawResult=True)
File "/opt/splunk/lib/python2.7/site-packages/splunk/rest/init.py", line 446, in simpleRequest
raise splunk.SplunkdConnectionException, str(e)
SplunkdConnectionException: Splunkd daemon is not responding: ('The read operation timed out',)

I added this line to the __init__.py file in /opt/splunk/lib/python2.7/site-packages/splunk/rest.

logger.error('problem=splunkd_socket_connection_exception msg="%s" aTry=%s tries=%s wait=%s uri="%s" method=%s headers="%s" body="%s" serverResponse="%s" sessionSource="%s" proxyMode="%s" http_vars="%s" http_dir="%s" webkeyfile="%s" webcertfile="%s" error_dir="%s" pprint_error="%s" '%(e, aTry, tries, wait, uri, method, headers, payload, serverResponse, sessionSource, proxyMode, pprint(vars(h)), dir(h), str(getWebKeyFile()), str(getWebCertFile), dir(e), pprint(vars(e)) ) )

It outputs this:


2013-06-06 16:06:41,655 ERROR [51b0ebb39d7fb184803e90] init:445 - problem=splunkd_socket_connection_exception msg="The read operation timed out" aTry=0 tries=4 wait=10 uri="https://127.0.0.1:8089/servicesNS/USER/search/search/jobs" method=POST headers="{'Authorization': 'Splunk AUTHKEY'}" body="latest_time=1370542605.17&ui_dispatch_app=search&ui_dispatch_view=flashtimeline&max_count=10000&search=search%20index%3D_internal%20host%3Dhsearchp01%20sourcetype%3Dsplunk_web_service%20earliest%3D-2m%40m&earliest_time=1370542604&auto_cancel=100&required_field_list=%2A&time_format=%25s.%25Q&status_buckets=300" serverResponse="bullpucky" sessionSource="direct" proxyMode="False" http_vars="None" http_dir="['class', 'delattr', 'dict', 'doc', 'format', 'getattribute', 'hash', 'init', 'module', 'new', 'reduce', 'reduce_ex', 'repr', 'setattr', 'sizeof', 'str', 'subclasshook', 'weakref', 'auth_from_challenge', '_conn_request', '_normalize_headers', '_request', 'add_certificate', 'add_credentials', 'authorizations', 'ca_certs', 'cache', 'certificates', 'clear_credentials', 'connections', 'credentials', 'disable_ssl_certificate_validation', 'follow_all_redirects', 'follow_redirects', 'force_exception_to_status_code', 'ignore_etag', 'optimistic_concurrency_methods', 'proxy_info', 'request', 'timeout']" webkeyfile="None" webcertfile="" error_dir="['class', 'delattr', 'dict', 'doc', 'format', 'getattribute', 'getitem', 'getslice', 'hash', 'init', 'module', 'new', 'reduce', 'reduce_ex', 'repr', 'setattr', 'setstate', 'sizeof', 'str', 'subclasshook', 'unicode', 'weakref_', 'args', 'errno', 'filename', 'message', 'strerror']" pprint_error="None"

I now don't know where else to check for issues. I thought this was fixed in 5.0.3 (SPL-66828), unless this is something else. The aTry variable is supposed to count the number of tries. It never gets past 0, which means the socket error happens before a second try!

Tags (3)
0 Karma
1 Solution

alacercogitatus
SplunkTrust
SplunkTrust

We are now running 6.0.3. So this no longer applies to me, however I think the root cause was Disk I/O on the server.

View solution in original post

0 Karma

alacercogitatus
SplunkTrust
SplunkTrust

We are now running 6.0.3. So this no longer applies to me, however I think the root cause was Disk I/O on the server.

View solution in original post

0 Karma