Splunk Enterprise

What is this error "Socket error from [Search Head IP] while accessing /services/streams/search: broken pipe"?

jencot01
Explorer

I have recently run into an issue with multiple "WARN HttpListener [HttpDedicatedIoThread:0] Socket error from [Search Head IP] while accessing /services/streams/search: broken pipe" for each Indexer in the Index cluster.

There is a SH cluster, and a standalone SH. The standalone houses an app that does heavy backend searching. And the majority of the errors are from the standalone. When looking at the "I/O Operations per second" and "Storage I/O Saturation (cold/hot/opt)" from the Monitoring Console all instances are below 1%.

I am not sure which settings to adjust to fix this error. Would adjusting the maxSockets and/or maxThreads in the server.conf help? Currently they are both set to default. Or should I be looking at values in limits.conf?

This is happening on version 9.0.1. Any suggestions to help solve this would be much appreciated. Thanks!

Tags (2)
0 Karma

philipvaughn
New Member

If you want a quick breakdown of how to diagnose infrastructure bottlenecks (CPU, I/O, and network), I found a helpful Singapore-based guide while troubleshooting a similar stability issue, visit here (Plumber Singapore), even though it’s unrelated to Splunk directly, their resource sections explain infrastructure load behavior in a simple way that helped me spot an I/O layer issue in one of our clusters.

Based on what you’ve shared, I’d start with OS-level limits and THP validation, then slowly test increased socket/thread values once the system baseline is stable. Hope this helps, would be interested to hear what you find once you dig into those settings.

0 Karma

anirban_td
Explorer

We recently upgraded our deployment (IDX Cluster + SH Cluster) from 8.2.6 to 9.0.1

Since the upgrade, we see a huge number of warnings for :

WARN HttpListener [8466 HttpDedicatedIoThread-7] - Socket error from <ip:port> while accessing <URI>: Broken pipe

Along with this, there are huge number of errors for Captain Disconnected, "This member has marked the connection to the search head captain as down", timeouts, and skipped searches.

These errors/warnings typically occur during peak loads.

As such we found that THP was enabled on few of SH Cluster members. ulimits were set to Splunk recommended values, but we are in the process of increasing them in our prod env.

Further troubleshooting pending, Will get back with updates.

0 Karma

jencot01
Explorer

To add to the above, I have also found this error in splunkd.log on the indexers:

ERROR SearchProcessRunner  PreforkedSearchesManager-0   - preforked process = 0/253717  hung up

This error aligns with the "broken pipe" error on my post above.  There are no skipped searches.

I'm at a loss to what I steps I should take next to troubleshoot this issue.   Again, any help/suggestions would be appreciated.

0 Karma

mika703
Engager

Same happened to our Infrastructure after Upgrading from 8.2.9 -> 9.0.5. Has anybody figured out whats causing this messages and what the impact is?

 

07-21-2023 07:42:16.414 +0200 ERROR SearchProcessRunner [24412 PreforkedSearchesManager-0] - preforked process=063487 hung up

0 Karma

Gregster66
Engager

Seeing the same after an upgrade from v8 to v9.0.6.
I'm suspecting something went wrong during the upgrade but don't have any solid evidence yet. 
Did anyone manage to get to the bottom of this ?

0 Karma

tfellinger
New Member

Did any of you ever fixed this issue? If so, how?

Thanks!

0 Karma

jlc00
Loves-to-Learn Lots

I was never able to fix the issue.  Still trying to figure it out.

0 Karma

mika703
Engager

We are still having these ERROR Messages since the upgrade. Never found some evidence or root cause 😕 

0 Karma
Get Updates on the Splunk Community!

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Accelerating Observability as Code with the Splunk AI Assistant

We’ve seen in previous posts what Observability as Code (OaC) is and how it’s now essential for managing ...

Integrating Splunk Search API and Quarto to Create Reproducible Investigation ...

 Splunk is More Than Just the Web Console For Digital Forensics and Incident Response (DFIR) practitioners, ...