Allow me to tell of our experience with this issue by way of addition to the excellent post by sgarvin55.
We have been experiencing the EXACT same issue for several weeks now on our Splunk indexer (running Windows Server 2008 R2 Enterprise with Splunk 5.0). When this issue would come up, no outbound socket connections could be made; though inbound connections (e.g. RDP, telnet, http, etc.) -- including from forwarders -- seemed to be unaffected. Unfortunately, since we couldn't find a way to resolve the issue, rebooting the server was the only course of action that would address the issue. But even that was temporary; the problem would usually resurface again in several (i.e. 5-7) days.
For better or worse, I didn't come across this question until after we'd already opened a support case with both Splunk and Microsoft. I'm glad to report that our problem seems to be "fixed" at this point without having to reboot. Here's what we did:
We followed the steps outlined in the Microsoft KB article (referenced by sgarvin55 above).
Based upon a recommendation from the Microsoft support rep, we replaced the Winsock and Winsock2 registry keys and reset winsock as follows:
Export the Winsock and Winsock2 registry keys from another "working" machine on our network having the same version, edition, build, service pack, etc. of Windows as our "non-working" splunk server. (Note: Keys live under HKEY\_LOCAL\_MACHINE\SYSTEM\CurrentControlSet\services\Winsock .) Just export each key to its own file.
From the "non-working" splunk server, do the following:
Back up (i.e. export) the Winsock and Winsock2 registry keys. (This is just a precautionary step)
Delete the Winsock and Winsock2 registry keys
Run netsh winsock reset from the command line
Merge the Winsock and Winsock2 registry keys (backed up from the "working" machine) (e.g. via context menu Merge option)
Unfortunately, because we performed the two steps in sequence, I can't say for sure which steps made the most difference. Though I suspect -- based upon feedback already given - that the first step may have been all that was necessary.
Please note that I do not recommend performing the registry key replacement technique. I just wanted to share the good news that after taking these steps, splunkweb appears to be functioning "normally" and we didn't even have to reboot the server in order to get things back to "normal". (Time will tell how durable this fix has been.)
Having invested so much time trying to get to the bottom of this problem, I'd be very interested to hear of others' experiences with this issue.
Update
The course of action described above did NOT ultimately fix the problem for us. Ultimately, it only prolonged the period of time that splunkweb would run before we experienced the "ResponseNotReady" errors again. I suspect that the steps to reset winsock were likely ineffectual. Instead, I bet that the extension we've effectively been granted each time before needing to reboot the server again is more likely attributable to the registry fix related to the registry modification of the MaxUserPort value.
I'm about to try the MS Hotfix recommended by splunkIT. After reading the KB article associated with it, I am much more confident that it will resolve the issue.
... View more