in an initial deployment we have 7 hosts sending data to 2 HF acting merely as gateways that sends all data to a LB VIP in another location that puts data to 2 HF that finally send it to the indexers, this was working ok so we decide to deploy SplunkForwarder to 60 new machines. After the deployment no data arrives to the indexers:
In HF (The first ones) splunkd.log :
01-16-2015 06:24:33.020 -0300 ERROR TcpInputProc - Error encountered for connection from src=X.X.X.X:54056. Broken pipe 01-16-2015 06:24:49.244 -0300 ERROR TcpInputProc - Error encountered for connection from src=X.X.X.X:58987. Broken pipe
In UF (Splunkforwarder windows 6.0.4) splunkd.log:
01-16-2015 07:03:12.091 +0200 WARN TcpOutputProc - Cooked connection to ip=HF01:9997 timed out 01-16-2015 07:03:12.091 +0200 WARN TcpOutputProc - Cooked connection to ip=HF02:9997 timed out
in HF01 (same to HF02) inputs.conf:
host = HF01
index = testindex
disabled = 0
queueSize = 7MB
in UF outputs.conf
server = HF01, HF02
Telnet against 9997 was working fine from windows box to HF01 and 02, and Iperf show good connection stats. In HF monitoring the connections with: watch "netstat -patn | grep 9997" we see a lot of SYN_SEND connections but none ESTABLISHED. So, the connection was made but not established correctly.
After searching arround by "TcpInputProc Bronken pipe" or by "TcpOutputProc timeout" and not finding any solution the clue comes from this post: http://answers.splunk.com/answers/43259/intermediate-forwarder-connections-timeout.html
and after adding : connection_host = none
in [splunktcp://9997] section of HF inputs.conf solve the issue.
So the thing was that inverse DNS resolution was done by Splunk HFs and this was causing timing out the connections from UFs.
Finally, this is not a question....it's only documented for if it helps other people.
This error is caused by the heartbeat function. every 30 seconds the heartbeat will send to indexer. if the indexer don't get it during that time, the indexer writes a log with the timeout message. network devices like a firewall can causing this or long remote connections. I disabled the heartbeat. Other solution could be change the time frequency from 30 seconds...
* How often (in seconds) to send a heartbeat packet to the receiving server.
* Heartbeats are only sent if sendCookedData=true.
* Defaults to 30 (seconds).
It is a mechanism for the forwarder to to know that the receiver (ie indexer) is alive. If the indexer does not send a return packet to the forwarder, the forwarder will declare this receiver unreachable and not forward data to it. By default a packet is sent every 30s.