Archive

Bronken pipe (on splunkd.log HF) + timeouts (on splunkd.log on UF) no data indexing

New Member

HI,
in an initial deployment we have 7 hosts sending data to 2 HF acting merely as gateways that sends all data to a LB VIP in another location that puts data to 2 HF that finally send it to the indexers, this was working ok so we decide to deploy SplunkForwarder to 60 new machines. After the deployment no data arrives to the indexers:

In HF (The first ones) splunkd.log :

01-16-2015 06:24:33.020 -0300 ERROR TcpInputProc - Error encountered for connection from src=X.X.X.X:54056. Broken pipe
01-16-2015 06:24:49.244 -0300 ERROR TcpInputProc - Error encountered for connection from src=X.X.X.X:58987. Broken pipe

In UF (Splunkforwarder windows 6.0.4) splunkd.log:

01-16-2015 07:03:12.091 +0200 WARN TcpOutputProc - Cooked connection to ip=HF01:9997 timed out
01-16-2015 07:03:12.091 +0200 WARN TcpOutputProc - Cooked connection to ip=HF02:9997 timed out

in HF01 (same to HF02) inputs.conf:

[default]
host = HF01
index = testindex
[splunktcp://9997]
disabled = 0
queueSize = 7MB

in UF outputs.conf


[tcpout]
defaultGroup=indexer1
[tcpout:indexer1]
server = HF01, HF02

Telnet against 9997 was working fine from windows box to HF01 and 02, and Iperf show good connection stats. In HF monitoring the connections with: watch "netstat -patn | grep 9997" we see a lot of SYN_SEND connections but none ESTABLISHED. So, the connection was made but not established correctly.

After searching arround by "TcpInputProc Bronken pipe" or by "TcpOutputProc timeout" and not finding any solution the clue comes from this post: http://answers.splunk.com/answers/43259/intermediate-forwarder-connections-timeout.html
and after adding :
connection_host = none
in [splunktcp://9997] section of HF inputs.conf solve the issue.
So the thing was that inverse DNS resolution was done by Splunk HFs and this was causing timing out the connections from UFs.

Finally, this is not a question....it's only documented for if it helps other people.

Tags (1)
0 Karma

Splunk Employee
Splunk Employee

This error is caused by the heartbeat function. every 30 seconds the heartbeat will send to indexer. if the indexer don't get it during that time, the indexer writes a log with the timeout message. network devices like a firewall can causing this or long remote connections. I disabled the heartbeat. Other solution could be change the time frequency from 30 seconds...

heartbeatFrequency =
* How often (in seconds) to send a heartbeat packet to the receiving server.
* Heartbeats are only sent if sendCookedData=true.
* Defaults to 30 (seconds).

It is a mechanism for the forwarder to to know that the receiver (ie indexer) is alive. If the indexer does not send a return packet to the forwarder, the forwarder will declare this receiver unreachable and not forward data to it. By default a packet is sent every 30s.

Thank you