Why does splunkd close internal sockets before the receive queue has been emptied? This appears to leave them laying around in CLOSE_WAIT state instead of moving directly to CLOSE.
$ netstat -tap |awk 'NR<3||/:8089/'
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 example.com:8089 *:* LISTEN 14836/splunkd
tcp 38 0 example.com:54043 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:54092 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:54097 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:54106 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:54041 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:54019 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:54042 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:54044 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:53727 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:54073 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:53499 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:54103 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:53495 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:53959 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:54098 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:53257 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:54094 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:54102 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:53498 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:53496 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:54016 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:54017 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:54096 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:53730 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:54108 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:54104 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:53729 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:53084 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:54095 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:53258 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:54090 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:53255 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:53728 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:53726 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:53962 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:54039 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:54034 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:53964 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:54101 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:54099 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:53961 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:54107 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:54105 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:53497 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:53492 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:53963 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:53254 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:54018 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:54087 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:53256 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:54093 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:54091 example.com:8089 CLOSE_WAIT 14927/python
tcp 38 0 example.com:54100 example.com:8089 CLOSE_WAIT 14927/python
I have the same problem, except i've got exactly 469 bytes left in the Recv-Q and thousands of leaked connections.
This has completely killed our installation of splunk (and our enthusiasm for the product.) Support wasn't able to help us and we've basically given up. 😞 If anyone finds a solution please post it here.
I have the same problem, except i've got exactly 469 bytes left in the Recv-Q and thousands of leaked connections.
This has completely killed our installation of splunk (and our enthusiasm for the product.) Support wasn't able to help us and we've basically given up. 😞 If anyone finds a solution please post it here.
According to http://blog.olivierlanglois.net/index.php/2008/06/05/close_wait_vs_time_wait:
"A TCP connection goes into the CLOSE_WAIT state when it receives a FIN segment from its peer. From that point the connection becomes half-duplex and the TCP connection will not receive any new data from its peer ... the socket will stay there as long as the server does not call close() explicitly on the socket."
Therefore, this does look like the socket is not being closed rather than an optimization technique.
I believe CLOSE_WAIT can actually be an optimization technique -- the client's able to start sending data again without going through the full process of establishing the connection. However, I'm not an expert at this, so just thought I'd mention it as a possibility, and leave the real answers to someone who knows for sure. Good question though.