We are experiencing connection errors when placing a load balancer in front of our deployment servers. These are the type of errors we are seeing on the deployment servers:
08-16-2018 11:49:59.213 -0400 WARN PubSubSvr - sender=connection_X.X.X.X_8089_abc.xyz.com channel=deploymentServer/phoneHome/default Message not dispatched (connection invalid)
The connection from the UF's to the deployment servers is fine, as we do not see any errors when using the FQDN of the deployment servers, only when using the load balanced DNS name. The UFs eventually connect through the LB, however, we do see a lot of the errors above. I was wondering if the LB's had to be setup a certain way for this to work correctly, like enable sticky sessions.
Hi. So do you clients connect in the next phone home interval?
We setup our deployment servers behind a load balancer with sticky connections. I have them phone home every 5 minutes, not every minute. We rarely change our inputs.conf so every five minutes is more than adequate.
I looked and we get the error you do but 5 minutes later, at the next phone home interval, the connection works.
It usually takes a few attempts for the clients to connect. I set the phone home interval to 2 minutes for testing purposes, and I see that the errors are intermittent. I'll ask our network team to enable sticky connections, and update my findings.
Or open a support case if this persists. You're trying something a bit complex with load balanced DS so it can't hurt to get Splunk support on board.