Hi all,
Am having trouble with Deployment clients, which seems to have started after an upgrade to 6.1.3. The symptoms are that deployment clients which were previously working just fine have stopped picking up application changes, and are no longer logging to their forwarders. I'm thinking this could be an SSL issue (the upgrade process here does a re-install from scratch for clients), but would welcome any pointers or suggestions for things to try.
The Deployment Server (which is also an Indexer and Search Head -- it's a small application!) is seeing the following in splunkd.log :
09-09-2014 11:36:32.594 +1000 WARN PubSubSvr - sender=connection_10.16.X.Y_8089_hostname.domain.tld_hostname_4AD3FF55-2746-1234-BB83-FE0EAB41B309 channel=deploymentServer/phoneHome/default Message not dispatched (connection invalid)
The deployment clients are seeing the following messages in splunkd.log :
09-05-2014 10:55:03.507 +1000 WARN DC:PhonehomeThread - No response to handshake for too long; starting over.
09-05-2014 10:55:03.507 +1000 WARN DC:PhonehomeThread - No response to handshake for too long; starting over.
09-05-2014 10:55:27.623 +1000 WARN DC:PhonehomeThread - No response to handshake for too long; starting over.
09-05-2014 10:55:27.623 +1000 WARN DC:PhonehomeThread - No response to handshake for too long; starting over.
09-05-2014 10:55:51.674 +1000 WARN DC:PhonehomeThread - No response to handshake for too long; starting over.
09-05-2014 10:55:51.674 +1000 WARN DC:PhonehomeThread - No response to handshake for too long; starting over.
09-05-2014 10:55:51.792 +1000 INFO DC:HandshakeReplyHandler - Handshake done.
09-05-2014 10:55:51.799 +1000 WARN PubSubConnection - Cannot convert str: error to a valid status, returning eRejected.
09-05-2014 10:56:51.907 +1000 INFO DC:HandshakeReplyHandler - Handshake done.
09-05-2014 11:26:54.472 +1000 INFO NetUtils - Error in connection() 111 - Connection refused
09-05-2014 11:28:54.751 +1000 INFO DC:DeploymentClient - channel=deploymentServer/phoneHome/default Will retry sending phonehome to DS; err=not_connected
09-05-2014 11:29:54.752 +1000 INFO DC:DeploymentClient - channel=deploymentServer/phoneHome/default Will retry sending phonehome to DS; err=not_connected
09-05-2014 11:29:54.815 +1000 INFO HttpPubSubConnection - SSL connection with id: connection_10.16.X.Y_8089_hostname.domain.tld_hostname_4AD3FF55-2746-1234-BB83-FE0EAB41B309
09-05-2014 11:29:54.821 +1000 WARN PubSubConnection - Cannot convert str: error to a valid status, returning eRejected.
09-05-2014 11:29:54.821 +1000 WARN HttpPubSubConnection - Batch subscribe aborted as status is not eOk`
(host names/GUIDs manually modified to protect the guilty 🙂
... View more