Deployment Architecture
Highlighted

How to troubleshoot why a deployment client is unable to phone home to the deployment server?

New Member

We are unable to get the deployment client to show in the deployment console. Other Windows/Linux servers are connected and apps are being distributed fine.

Deployment Client:

  • Windows 2012 x64
  • Splunk version 6.2.4

Deployment server:

  • oel 6 x64
  • splunk version 6.2.0

We have validated that the client can telnet to the deployment server on the correct port. We were able to see the TCP transaction on both sides and enabled debug logging on the client and deployment server. Deployment server has no entry regarding the client.
Client splunkd.log

08-12-2015 16:33:03.791 -0700 DEBUG DC:PhonehomeThread - PhonehomeThread::main top-of-loop, DC state=Initial
08-12-2015 16:33:03.791 -0700 DEBUG DC:PhonehomeThread - Attempting handshake
08-12-2015 16:33:03.791 -0700 DEBUG DC:DeploymentClient - Sending message <handshake/> to tenantService/handshake
08-12-2015 16:33:03.791 -0700 INFO  DC:DeploymentClient - channel=tenantService/handshake Will retry sending handshake message to DS; err=not_connected
08-12-2015 16:33:03.791 -0700 DEBUG DC:PhonehomeThread - Handshake not yet finished; will retry every 12.0sec
08-12-2015 16:33:03.791 -0700 DEBUG DC:PhonehomeThread - Phonehome thread will wait for 12.0sec (1)
0 Karma
Highlighted

Re: How to troubleshoot why a deployment client is unable to phone home to the deployment server?

Motivator

[Edited to preface with the caveat that I am assuming your initial telnet test is correctly framed, and that your idea of "the correct port" is 8089.]

Are you using SSL? Is it correctly configured?

The only real way to judge here is to run comparative tcpdumps from a working machine (preferably one in the same routed network zone - assuming there is firewalling and segragation going on here) and for the one which is failing (which since it is a 'Doze box would require a Linux installation receiving duplicate packets from the switch).

You could also tcpdump on/for the deployment server, to see if something hooky is going on there.

0 Karma
Highlighted

Re: How to troubleshoot why a deployment client is unable to phone home to the deployment server?

New Member

I went ahead and added enableSplunkdSSL = false to the server.conf file on both the deployment server and the client. This should remove any issues with SSL. The issue still persists.

Client splunkd.log


08-14-2015 11:34:06.132 -0700 DEBUG DC:PhonehomeThread - PhonehomeThread::main top-of-loop, DC state=Initial
08-14-2015 11:34:06.132 -0700 DEBUG DC:PhonehomeThread - Attempting handshake
08-14-2015 11:34:06.132 -0700 DEBUG DC:DeploymentClient - Sending message to tenantService/handshake
08-14-2015 11:34:06.132 -0700 INFO DC:DeploymentClient - channel=tenantService/handshake Will retry sending handshake message to DS; err=notconnected
08-14-2015 11:34:06.132 -0700 DEBUG DC:PhonehomeThread - Handshake not yet finished; will retry every 12.0sec
08-14-2015 11:34:06.132 -0700 DEBUG DC:PhonehomeThread - Phonehome thread will wait for 12.0sec (1)
08-14-2015 11:34:17.153 -0700 WARN HttpPubSubConnection - HTTP client error in http pubsub Read Timeout uri=http://10.156.101.127:2000/services/broker/connect/0A43BEC6-915B-488E-A60B-8241F1680FAF/IODWAPP242/2...
08-14-2015 11:34:17.153 -0700 WARN HttpPubSubConnection - Unable to parse message from PubSubSvr:
08-14-2015 11:34:17.153 -0700 INFO HttpPubSubConnection - Could not obtain connection, will retry after=71 seconds.
08-14-2015 11:34:18.132 -0700 DEBUG DC:PhonehomeThread - PhonehomeThread::main top-of-loop, DC state=Initial
08-14-2015 11:34:18.132 -0700 DEBUG DC:PhonehomeThread - Attempting handshake
08-14-2015 11:34:18.132 -0700 DEBUG DC:DeploymentClient - Sending message to tenantService/handshake
08-14-2015 11:34:18.132 -0700 INFO DC:DeploymentClient - channel=tenantService/handshake Will retry sending handshake message to DS; err=not
connected
08-14-2015 11:34:18.132 -0700 DEBUG DC:PhonehomeThread - Handshake not yet finished; will retry every 12.0sec
08-14-2015 11:34:18.132 -0700 DEBUG DC:PhonehomeThread - Phonehome thread will wait for 12.0sec (1)
08-14-2015 11:34:30.133 -0700 DEBUG DC:PhonehomeThread - PhonehomeThread::main top-of-loop, DC state=Initial
08-14-2015 11:34:30.133 -0700 DEBUG DC:PhonehomeThread - Attempting handshake
08-14-2015 11:34:30.133 -0700 DEBUG DC:DeploymentClient - Sending message to tenantService/handshake
08-14-2015 11:34:30.133 -0700 INFO DC:DeploymentClient - channel=tenantService/handshake Will retry sending handshake message to DS; err=not_connected
08-14-2015 11:34:30.133 -0700 DEBUG DC:PhonehomeThread - Handshake not yet finished; will retry every 12.0sec
08-14-2015 11:34:30.133 -0700 DEBUG DC:PhonehomeThread - Phonehome thread will wait for 12.0sec (1)

0 Karma
Highlighted

Re: How to troubleshoot why a deployment client is unable to phone home to the deployment server?

Motivator

In that case - and assuming that your previously mentioned telnet test was correctly framed to the right port - I fall back to the suggestion of a tcpdump to analyse the actual network traffic at the deployment server.

0 Karma
Highlighted

Re: How to troubleshoot why a deployment client is unable to phone home to the deployment server?

Splunk Employee
Splunk Employee

Before you start looking at TCP Dumps, can you confirm you have full network connectivity from the host to the DS? You'll need TCP to the DS on 8089 (Unless you changed the management port.) And also the ability to open dynamic ports for the download of the data from the DS to the Client.
Additionally, make sure you have a serverclass defined with and app in it for the client you are trying to connect with.

Highlighted

Re: How to troubleshoot why a deployment client is unable to phone home to the deployment server?

Contributor

+1 to what @Esix said.

Additionally, there are times when firewalls and auth/transparent proxies play evil and restrict the connection.

0 Karma
Highlighted

Re: How to troubleshoot why a deployment client is unable to phone home to the deployment server?

Motivator

They already mentioned that they have connectivity with a quick Telnet test. Admittedly I am assuming that the phrase "to the correct port" means what it says.

0 Karma
Highlighted

Re: How to troubleshoot why a deployment client is unable to phone home to the deployment server?

Motivator

Missing serverclass is not going to cause the handshake to fail, surely? Unless things have changed in V6 it will just result in no class matches and hence an empty configuration deployment. The handshake will still complete.

A tcpdump is a very quick and direct method of answering a whole bundle of fundamental network questions by direct observation and without the need for any circumstantial inference, before tinkering with configurations. You will know whether the packets are getting through, whether they are complete, the exact nature of the response if they are. There is absolutely no point in refining configurations if the fault lies on one end not talking correctly to the other.

0 Karma
Highlighted

Re: How to troubleshoot why a deployment client is unable to phone home to the deployment server?

New Member

Having this same issue myself trying to add deployment client functionality to existing heavy forwarders. In fact, when I run tcpdumps, I see this error messge in the logs:

"channel=tenantService/handshake Will retry sending handshake message to DS; err=not_connected"

When there are ZERO packets that have gone across the wire. The error is appearing without the DC even attempting to contact the server.

This is with Splunk Enterprise v6.2.2 with the deployment server running on a cluster master. Not concerned about performance here; this is on a dev box just to ahem prove that deployment server works.

0 Karma
Highlighted

Re: How to troubleshoot why a deployment client is unable to phone home to the deployment server?

Contributor

As with the problem described above...check your connection. telnet, ping, firewall settings, syntax in the config files.

0 Karma