Monitoring Splunk

Splunk web throws "ResponseNotReady" error

sgarvin55
Splunk Employee
Splunk Employee

Several times of the past few weeks we have tried to log into Splunk and have been greeted by a 500 Internal Server Error: ResponseNotReady.

Rebooting the server corrects the issue ,but the issue returns in a day or two, causing us to have to restart the machine its running on every few days.

We are running Splunk version 5.0.1 on a Windows 2008 R2 virtual machine, with 16 GB of RAM and 8 CPUs.

Tags (1)
1 Solution

sgarvin55
Splunk Employee
Splunk Employee

This isn't technically a Splunk issue, but here's how we resolved it:

We were seeing the 500 errors, and also noticed this issue when using a Windows configuration tool that uses a phone-home style of connection. We saw the following error from that software:

Could not send report: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full. - connect(2)

After searching Microsoft KB articles for the error, we found the following article that resolved the issue:

Title: When you try to connect from TCP ports greater than 5000 you receive the error 'WSAENOBUFS (10055)'

Symptom: If you try to set up TCP connections from ports that are greater than 5000, the local computer responds with the following WSAENOBUFS (10055) error message:

An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.

http://support.microsoft.com/kb/Q196271

Once I applied the change from the KB article, connections started flowing again to the operations system. The machine was able to authenticate to our windows domain, Splunkd started making connections to other machines to index, and we could connect to outside servers.

View solution in original post

Jason
Motivator
0 Karma

splunkIT
Splunk Employee
Splunk Employee

FYI: We have had good results so far applying the following hotfix to the systems exhibiting the error:
http://support.microsoft.com/kb/2577795

porcus
Engager

Allow me to tell of our experience with this issue by way of addition to the excellent post by sgarvin55.

We have been experiencing the EXACT same issue for several weeks now on our Splunk indexer (running Windows Server 2008 R2 Enterprise with Splunk 5.0). When this issue would come up, no outbound socket connections could be made; though inbound connections (e.g. RDP, telnet, http, etc.) -- including from forwarders -- seemed to be unaffected. Unfortunately, since we couldn't find a way to resolve the issue, rebooting the server was the only course of action that would address the issue. But even that was temporary; the problem would usually resurface again in several (i.e. 5-7) days.

For better or worse, I didn't come across this question until after we'd already opened a support case with both Splunk and Microsoft. I'm glad to report that our problem seems to be "fixed" at this point without having to reboot. Here's what we did:

  1. We followed the steps outlined in the Microsoft KB article (referenced by sgarvin55 above).
  2. Based upon a recommendation from the Microsoft support rep, we replaced the Winsock and Winsock2 registry keys and reset winsock as follows:
    1. Export the Winsock and Winsock2 registry keys from another "working" machine on our network having the same version, edition, build, service pack, etc. of Windows as our "non-working" splunk server. (Note: Keys live under HKEY\_LOCAL\_MACHINE\SYSTEM\CurrentControlSet\services\Winsock.) Just export each key to its own file.
    2. From the "non-working" splunk server, do the following:
      1. Back up (i.e. export) the Winsock and Winsock2 registry keys. (This is just a precautionary step)
      2. Delete the Winsock and Winsock2 registry keys
      3. Run netsh winsock reset from the command line
      4. Merge the Winsock and Winsock2 registry keys (backed up from the "working" machine) (e.g. via context menu Merge option)

Unfortunately, because we performed the two steps in sequence, I can't say for sure which steps made the most difference. Though I suspect -- based upon feedback already given - that the first step may have been all that was necessary.

Please note that I do not recommend performing the registry key replacement technique. I just wanted to share the good news that after taking these steps, splunkweb appears to be functioning "normally" and we didn't even have to reboot the server in order to get things back to "normal". (Time will tell how durable this fix has been.)

Having invested so much time trying to get to the bottom of this problem, I'd be very interested to hear of others' experiences with this issue.

Update

The course of action described above did NOT ultimately fix the problem for us. Ultimately, it only prolonged the period of time that splunkweb would run before we experienced the "ResponseNotReady" errors again. I suspect that the steps to reset winsock were likely ineffectual. Instead, I bet that the extension we've effectively been granted each time before needing to reboot the server again is more likely attributable to the registry fix related to the registry modification of the MaxUserPort value.

I'm about to try the MS Hotfix recommended by splunkIT. After reading the KB article associated with it, I am much more confident that it will resolve the issue.

sgarvin55
Splunk Employee
Splunk Employee

This isn't technically a Splunk issue, but here's how we resolved it:

We were seeing the 500 errors, and also noticed this issue when using a Windows configuration tool that uses a phone-home style of connection. We saw the following error from that software:

Could not send report: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full. - connect(2)

After searching Microsoft KB articles for the error, we found the following article that resolved the issue:

Title: When you try to connect from TCP ports greater than 5000 you receive the error 'WSAENOBUFS (10055)'

Symptom: If you try to set up TCP connections from ports that are greater than 5000, the local computer responds with the following WSAENOBUFS (10055) error message:

An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.

http://support.microsoft.com/kb/Q196271

Once I applied the change from the KB article, connections started flowing again to the operations system. The machine was able to authenticate to our windows domain, Splunkd started making connections to other machines to index, and we could connect to outside servers.

sgarvin55
Splunk Employee
Splunk Employee

I went back to verify the OS that this customer was using and he was using:

[System Summary]

Item Value

OS Name Microsoft Windows Server 2008 R2 Enterprise
Version 6.1.7601 Service Pack 1 Build 7601

He did use the MS article to resolve HIS particular issue.

ekolkmeier
Engager

Did the original poster try this fix and can confirm this resolved the issue?

The KB mentioned does not state that it applies to Windows Server 2008 R2, but the 'Applies To' page at the link hasn't been updated since 2009.

I currently have a support case opened for this exact issue, Windows Server 2008 R2 and Splunk 5.0.1. Did not see this issue prior to 5.x upgrade.

Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...