in an initial deployment we have 7 hosts sending data to 2 HF acting merely as gateways that sends all data to a LB VIP in another location that puts data to 2 HF that finally send it to the indexers, this was working ok so we decide to deploy SplunkForwarder to 60 new machines. After the deployment no data arrives to the indexers:
In HF (The first ones) splunkd.log :
01-16-2015 06:24:33.020 -0300 ERROR TcpInputProc - Error encountered for connection from src=X.X.X.X:54056. Broken pipe
01-16-2015 06:24:49.244 -0300 ERROR TcpInputProc - Error encountered for connection from src=X.X.X.X:58987. Broken pipe
In UF (Splunkforwarder windows 6.0.4) splunkd.log:
01-16-2015 07:03:12.091 +0200 WARN TcpOutputProc - Cooked connection to ip=HF01:9997 timed out
01-16-2015 07:03:12.091 +0200 WARN TcpOutputProc - Cooked connection to ip=HF02:9997 timed out
in HF01 (same to HF02) inputs.conf:
host = HF01
index = testindex
disabled = 0
queueSize = 7MB
in UF outputs.conf
server = HF01, HF02
Telnet against 9997 was working fine from windows box to HF01 and 02, and Iperf show good connection stats. In HF monitoring the connections with: watch "netstat -patn | grep 9997" we see a lot of SYN_SEND connections but none ESTABLISHED. So, the connection was made but not established correctly.
After searching arround by "TcpInputProc Bronken pipe" or by "TcpOutputProc timeout" and not finding any solution the clue comes from this post: http://answers.splunk.com/answers/43259/intermediate-forwarder-connections-timeout.html
and after adding :
connection_host = none
in [splunktcp://9997] section of HF inputs.conf solve the issue.
So the thing was that inverse DNS resolution was done by Splunk HFs and this was causing timing out the connections from UFs.
Finally, this is not a question....it's only documented for if it helps other people.
... View more