Getting Data In

Bronken pipe (on splunkd.log HF) + timeouts (on splunkd.log on UF) no data indexing

enrictid
New Member

HI,
in an initial deployment we have 7 hosts sending data to 2 HF acting merely as gateways that sends all data to a LB VIP in another location that puts data to 2 HF that finally send it to the indexers, this was working ok so we decide to deploy SplunkForwarder to 60 new machines. After the deployment no data arrives to the indexers:

In HF (The first ones) splunkd.log :

01-16-2015 06:24:33.020 -0300 ERROR TcpInputProc - Error encountered for connection from src=X.X.X.X:54056. Broken pipe
01-16-2015 06:24:49.244 -0300 ERROR TcpInputProc - Error encountered for connection from src=X.X.X.X:58987. Broken pipe

In UF (Splunkforwarder windows 6.0.4) splunkd.log:

01-16-2015 07:03:12.091 +0200 WARN TcpOutputProc - Cooked connection to ip=HF01:9997 timed out
01-16-2015 07:03:12.091 +0200 WARN TcpOutputProc - Cooked connection to ip=HF02:9997 timed out

in HF01 (same to HF02) inputs.conf:

[default]
host = HF01
index = testindex
[splunktcp://9997]
disabled = 0
queueSize = 7MB

in UF outputs.conf


[tcpout]
defaultGroup=indexer1
[tcpout:indexer1]
server = HF01, HF02

Telnet against 9997 was working fine from windows box to HF01 and 02, and Iperf show good connection stats. In HF monitoring the connections with: watch "netstat -patn | grep 9997" we see a lot of SYN_SEND connections but none ESTABLISHED. So, the connection was made but not established correctly.

After searching arround by "TcpInputProc Bronken pipe" or by "TcpOutputProc timeout" and not finding any solution the clue comes from this post: http://answers.splunk.com/answers/43259/intermediate-forwarder-connections-timeout.html
and after adding :
connection_host = none
in [splunktcp://9997] section of HF inputs.conf solve the issue.
So the thing was that inverse DNS resolution was done by Splunk HFs and this was causing timing out the connections from UFs.

Finally, this is not a question....it's only documented for if it helps other people.

Tags (1)
0 Karma

hliakathali_spl
Splunk Employee
Splunk Employee

This error is caused by the heartbeat function. every 30 seconds the heartbeat will send to indexer. if the indexer don't get it during that time, the indexer writes a log with the timeout message. network devices like a firewall can causing this or long remote connections. I disabled the heartbeat. Other solution could be change the time frequency from 30 seconds...

heartbeatFrequency =
* How often (in seconds) to send a heartbeat packet to the receiving server.
* Heartbeats are only sent if sendCookedData=true.
* Defaults to 30 (seconds).

It is a mechanism for the forwarder to to know that the receiver (ie indexer) is alive. If the indexer does not send a return packet to the forwarder, the forwarder will declare this receiver unreachable and not forward data to it. By default a packet is sent every 30s.

Thank you

Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...