I have recently run into an issue with multiple "WARN HttpListener [HttpDedicatedIoThread:0] Socket error from [Search Head IP] while accessing /services/streams/search: broken pipe" for each Indexer in the Index cluster.
There is a SH cluster, and a standalone SH. The standalone houses an app that does heavy backend searching. And the majority of the errors are from the standalone. When looking at the "I/O Operations per second" and "Storage I/O Saturation (cold/hot/opt)" from the Monitoring Console all instances are below 1%.
I am not sure which settings to adjust to fix this error. Would adjusting the maxSockets and/or maxThreads in the server.conf help? Currently they are both set to default. Or should I be looking at values in limits.conf?
This is happening on version 9.0.1. Any suggestions to help solve this would be much appreciated. Thanks!
From what you’ve described, the “Socket error … broken pipe” on /services/streams/search is typically not the actual root failure but a downstream symptom of Splunk search connections being cut unexpectedly between the Search Head and indexers. In most cases, this happens when the indexer side closes the streaming connection because the underlying search process has stalled, been terminated, or hit a temporary resource constraint, even if overall system metrics like IOPS or CPU look normal on averages.
The fact that you are also seeing preforked search processes hanging and captain disconnect messages strongly suggests intermittent search pipeline saturation or scheduling delays rather than a simple networking or thread configuration issue. Increasing values like maxSockets or maxThreads in server.conf may reduce visible errors temporarily, but it often just pushes more load into an already constrained search layer instead of solving the underlying bottleneck.
In environments like this, the real issue is usually found in short bursts of CPU pressure, dispatch directory latency, search concurrency limits in limits.conf, or file descriptor exhaustion at the OS level. This is similar to diagnosing unstable flow in real systems where the average looks fine but momentary pressure drops cause failures, much like how services such as Plumber Singapore would investigate intermittent pipe blockages rather than just increasing pump capacity.
A good next step is to correlate timestamps between indexer search process logs and SH streaming errors to identify whether searches are being killed, delayed, or timing out under load.
We recently upgraded our deployment (IDX Cluster + SH Cluster) from 8.2.6 to 9.0.1
Since the upgrade, we see a huge number of warnings for :
WARN HttpListener [8466 HttpDedicatedIoThread-7] - Socket error from <ip:port> while accessing <URI>: Broken pipeAlong with this, there are huge number of errors for Captain Disconnected, "This member has marked the connection to the search head captain as down", timeouts, and skipped searches.
These errors/warnings typically occur during peak loads.
As such we found that THP was enabled on few of SH Cluster members. ulimits were set to Splunk recommended values, but we are in the process of increasing them in our prod env.
Further troubleshooting pending, Will get back with updates.
To add to the above, I have also found this error in splunkd.log on the indexers:
ERROR SearchProcessRunner PreforkedSearchesManager-0 - preforked process = 0/253717 hung up
This error aligns with the "broken pipe" error on my post above. There are no skipped searches.
I'm at a loss to what I steps I should take next to troubleshoot this issue. Again, any help/suggestions would be appreciated.
Same happened to our Infrastructure after Upgrading from 8.2.9 -> 9.0.5. Has anybody figured out whats causing this messages and what the impact is?
07-21-2023 07:42:16.414 +0200 ERROR SearchProcessRunner [24412 PreforkedSearchesManager-0] - preforked process=063487 hung up
Seeing the same after an upgrade from v8 to v9.0.6.
I'm suspecting something went wrong during the upgrade but don't have any solid evidence yet.
Did anyone manage to get to the bottom of this ?
Did any of you ever fixed this issue? If so, how?
Thanks!
I was never able to fix the issue. Still trying to figure it out.
We are still having these ERROR Messages since the upgrade. Never found some evidence or root cause 😕