Archive

What is the cause of these socket errors reported in splunkd.log since upgrading to 6.0?

Splunk Employee
Splunk Employee

Shortly after upgrading to 6.0, error such as these have started showing up in splunkd.log:

WARN HttpListener - Socket error from 127.0.0.1 while accessing /servicesNS/-/search/admin/summarization: Broken pipe

In addition - and this is a lot more worrying - Splunk Web sometimes becomes inaccessible and indexing is interrupted.

What could be causing this?

Tags (2)
1 Solution

Splunk Employee
Splunk Employee

As it turns out, Splunk 6.0's reworked REST HTTP server introduces new self-imposed limits on the number of threads and sockets it allows itself to use. This is visible on startup in splunkd.log:

INFO  loader - Limiting REST HTTP server to 341 sockets
INFO  loader - Limiting REST HTTP server to 341 threads

These are roughly set to one third of the open file descriptor limit imposed on splunkd by the operating system. Here, we had an open file descriptor limit of 1,024, which resulted in a self-imposed limit of 341 threads and sockets:

INFO  ulimit - Limit: open files: 1024 files

When one of these limits is hit, splunkd won't be able to honor further REST API calls and many things can go wrong.

To prevent this from happening, one should raise or lift the per-process file descriptor limit on systems that are dedicated to running Splunk.

Alternatively, one can also change the way these self-imposed limits are put in place by splunkd in server.conf:

maxThreads = <int>
    * Number of threads that can be used by active HTTP transactions.
      This can be limited to constrain resource usage.
    * If set to 0 (the default) a limit will be automatically picked
      based on estimated server capacity.
    * If set to a negative number, no limit will be enforced.

maxSockets = <int>
    * Number of simultaneous HTTP connections that we'll accept simultaneously.
      This can be limited to constrain resource usage.
    * If set to 0 (the default) a limit will be automatically picked
      based on estimated server capacity.
    * If set to a negative number, no limit will be enforced.

View solution in original post

Explorer

Did anyone ever get this resolved or get closer to identifying cause? There seems to be a common thread regarding the "/summarization" REST endpoints, I'm currently seeing the same broken pipe messages for this same endpoint for multiple apps.

Like many others, I don't appear to be hitting the socket/thread thresholds, or at least I don't see the any occurrences of the "WARN HttpListener - Can't handle request for threads already in use" error

Builder

Hi! Did you ever find out why you were getting the broken pipe warning? I've seem to have encountered the same problem. One observation from my side is that I'm low on free RAM on the machine, but I don't know if this is related or not.

0 Karma

Motivator

I'm on Splunk 7.1.5 and I have not seen them so far.

0 Karma

Motivator

I get these: WARN HttpListener - Socket error from 127.0.0.1 while accessing /servicesNS/-/search/admin/summarization: Broken pipe

0 Karma

Splunk Employee
Splunk Employee

I'm having the same issue:

WARN HttpListener - Socket error from 127.0.0.1 while accessing /servicesNS/nobody/search/data/inputs/rest/Twitter/: Broken pipe

I have also have bumped up my ulimits to these levels and set the maxSockets and maxThreads to -1. Any idea of what else I should change/look at? Thanks!

core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 127433
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 16384
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

0 Karma

Splunk Employee
Splunk Employee

As it turns out, Splunk 6.0's reworked REST HTTP server introduces new self-imposed limits on the number of threads and sockets it allows itself to use. This is visible on startup in splunkd.log:

INFO  loader - Limiting REST HTTP server to 341 sockets
INFO  loader - Limiting REST HTTP server to 341 threads

These are roughly set to one third of the open file descriptor limit imposed on splunkd by the operating system. Here, we had an open file descriptor limit of 1,024, which resulted in a self-imposed limit of 341 threads and sockets:

INFO  ulimit - Limit: open files: 1024 files

When one of these limits is hit, splunkd won't be able to honor further REST API calls and many things can go wrong.

To prevent this from happening, one should raise or lift the per-process file descriptor limit on systems that are dedicated to running Splunk.

Alternatively, one can also change the way these self-imposed limits are put in place by splunkd in server.conf:

maxThreads = <int>
    * Number of threads that can be used by active HTTP transactions.
      This can be limited to constrain resource usage.
    * If set to 0 (the default) a limit will be automatically picked
      based on estimated server capacity.
    * If set to a negative number, no limit will be enforced.

maxSockets = <int>
    * Number of simultaneous HTTP connections that we'll accept simultaneously.
      This can be limited to constrain resource usage.
    * If set to 0 (the default) a limit will be automatically picked
      based on estimated server capacity.
    * If set to a negative number, no limit will be enforced.

View solution in original post

Engager

Raising these limits did not work for me - even if I raise it to 'unlimited' by setting the value to -1, the same error (broken pipe) keeps showing up. After a restart of splunk, you can do 1 or 2 queries, before the web interface stops - only server error keep showing up...

Contributor

Was this ever resolved for you guys?

0 Karma

Engager

Not for me... but I have not tried ever since because AIX is not supported anymore for the newest releases.

0 Karma

Communicator

We have many application, but thousands events per day of the broken pipe WARN on only one of all the REST endpoints. None of the answers or comments seem to address this.
/servicesNS/_/thisONEapp/admin/summarization

New Member

Did you got this resolved ? Same happening to our one particular REST API call. Other old & new REST calls works just fine

0 Karma

Splunk Employee
Splunk Employee

example of http limits
Since Splunk 6.* the default limit based on 33% of the lowers value between :
the soft limit of number of processes, and/or the soft limit of number of open files.

04-22-2014 12:46:00.111 -0400 INFO ulimit - Limit: user processes: 1024 processes [hard maximum: 256007 processes]
04-22-2014 12:47:25.757 -0400 INFO loader - Limiting REST HTTP server to 341 threads

and is ultimately reaching the limit and causing the connection to fail :

04-21-2014 12:50:02.404 -0400 WARN HttpListener - Can't handle request for /services/server/info, 341 threads already in use

Motivator

I have a ticket open right now for version 6.2.2. We are running into this issue with our deployment server where we hit the max number of threads (about 2500 agents checking in every 3 minutes). After finding this post we raised the hard and soft ulimit from 10240 to 102400, stopped/started Splunk, and later in the day ran into the issue again. In reading this thread again I looked at the server.conf file. Both maxThreads and maxSockets are set to 0. In looking at the internal logs after boot Splunk is showing a limit of 34,133 sockets and only 2,673 threads.

Why weren't those settings put into limits.conf?

0 Karma

Motivator

Adjusted maxThreads to -1 and after stopping and starting Splunk it is still saying there is a limit of 2,673 threads. Running btool shows only one maxThreads setting in play.

0 Karma

Motivator

I should probably add we appear to hit this issue after restarting the deployment server. In looking at netstat we see all kinds of close_wait states. It takes a Splunk restart (very very slow) to address the issue at this point.

0 Karma

Builder

This is still in known issues.
SPL-82389 server.conf In [httpServer] server stanza, maxThreads/maxSockets do not accept negative numbers.

Set the max values to something high and see if it works.

0 Karma

Path Finder

Thanks Joe for pointing this out. Big help.

0 Karma

Splunk Employee
Splunk Employee

In splunkd.log you should receive a message similar to:
WARN HttpListener - Can't handle request for threads already in use

0 Karma

Motivator

In my logs (6.1.5) it is not "WARN HttpListener" but rather "WARN HttpListener" (two spaces).

I don't see any "Can't handle request for threads already in use" errors. But I see 7-15 of these every 15 minutes in the logs:

WARN HttpListener - Socket error from 127.0.0.1 while accessing /servicesNS/-/search/admin/summarization: Broken pipe

My limits are:

INFO loader - Limiting REST HTTP server to 3413 sockets
INFO loader - Limiting REST HTTP server to 397 threads

My ulimit info from the logs:

INFO ulimit - Limit: virtual address space size: unlimited
INFO ulimit - Limit: data segment size: unlimited
INFO ulimit - Limit: resident memory size: unlimited
INFO ulimit - Limit: stack size: 10485760 bytes [hard maximum: unlimited]
INFO ulimit - Limit: data file size: unlimited
INFO ulimit - Limit: open files: 10240 files
INFO ulimit - Limit: user processes: 10000 processes [hard maximum: unlimited]

0 Karma

Explorer

Have you see whether Splunk logs when the limit is imposed and threads are killed, and if it does could you share where that is being logged? Thanks!

0 Karma