I recently started installing the Splunk Universal Forwarder on all of our Windows hosts. The deployment goes fine till now and I can see the logs on our Splunk Enterprise server.
What I don't like though, it that when issuing the "netstat" command on the Universal Forwarder side, shows me more than one connections towards Splunk server. One is ESTABLISHED which is good, but there are 2-4 additional on the TIMEWAIT state. On some machines it might be on the SYNSENT state as well. The problem that I can see going forwards pertains performance degradation on the Splunk Server side due to hundreds of connections waiting to time out (given the fact that we have more than 150 Windows hosts).
Is it possible to tweak the Universal Forwarder, or the SplunkTAWindows add-on to limit the connections to one?
This is what the /var/log/splunkd.log file spits on one of the machines.
05-07-2015 15:26:14 WARN TcpOutputProc - Raw connection to ip=x.x.x.x:9998 timed out
05-07-2015 15:26:14 WARN TcpOutputProc - Ping connection to idx=x.x.x.x:9998 continuing connections
05-07-2015 15:26:44 WARN TcpOutputProc - Raw connection to ip=x.x.x.x:9998 timed out
05-07-2015 15:26:44 WARN TcpOutputProc - Ping connection to idx=x.x.x.x:9998 continuing connections
05-07-2015 15:27:14 WARN TcpOutputProc - Raw connection to ip=x.x.x.x:9998 timed out
05-07-2015 15:27:14 WARN TcpOutputProc - Ping connection to idx=x.x.x.x:9998 continuing connections
These are not the same thing. The TIMEWAIT, ESTABLISHED, and SYNSENT states shown by netstat output are all part of the standard TCP state transitions. These are an operating system feature of the operating system (either Windows or *nix) and are perfectly normal in terms of how TCP works. Splunk uses the OS' TCP stack, but cannot make it operate outside of the standards.
This image helps to visualize how TCP performs its state transitions. The TIMEWAIT state is a transition state that is necessary to deal with any delayed TCP segments that might arrive after the closing of the socket. It is perfectly normal to see sockets in TIMEWAIT.
TCP is surprisingly robust. "Hundreds of connections" is nothing major for it to worry about. I would not even be remotely concerned about that at all.
What I would be concerned about is the TCP timeouts in your log. I would suspect either a misconfiguration or network difficulties between the forwarders and the indexer.