Spent a day on this and have been seeking help in Splunk IRC. Bout to lose it.
Deployment Server states no clients have contacted server. splunkd.log on the Universal Forwarder contains the following:
10-11-2012 15:47:43.879 -0700 WARN DeploymentClient - Phonehome thread is now started.
10-11-2012 15:47:43.879 -0700 WARN DeploymentClient - Unable to send handshake message to deployment server. Error status is: not_connected
Conf on forwarder is located at $SPLUNK_HOME/etc/apps/deployment/default.
deploymentclient.conf contains the following:
disabled = false
targetUri = 10.45.222.191:8089
App.conf in that same directory contains the following:
state = enabled
On the Deployment Server, $SPLUNK_HOME/etc/system/local/serverclass.conf looks like this:
filterType = whitelist
#filterType = blacklist
repositoryLocation = /opt/splunk/etc/deployment-apps
restartSplunkd = true
whitelist.0 = 10.45.222.*
#blacklist.0 = 10.45.222.190
./splunk reload deploy-serveron the Deployment Server and
./splunk restarton the Universal Forwarder multiple times.
I've confirmed that the Universal Forwarder can hit the Deployment Server on 8089.
I'm stumped and have spent a whole day with something I've done a dozen times before. Anyone have any ideas?
Changed the log.cfg on the Deployment Server to log debug messages and restarted Splunk... Hosts began reporting. Seems like Splunk just needed a restart 😕
Splunk UF unable to communicate with Deployment client
Customer was having issues with splunk forwarder agents not communicating with the splunk deployment server. All the other servers in the same subnet and configuration were able to communicate.
The deploymentclient.conf and all other settings were similar for the entire batch. Customer had issues with 3 servers that were not communicating. Customer tried using the DNS name and also with the IP address of the deployment client server. Customer confirmed that the network connectivity was available for this subnet.
Things to investigate:
On one of the affected forwarders, put the following channels in debug: DC:DeploymentClient, DC:UpdateServerclassHandler, DC:HandshakeReplyHandler, DC:PhonehomeThread, DSDCCommon, and HttpPubSubConnection, restart UF and check splunkd.log.
Also check spulnkd.log on the deployment server. Grep splunkd.log for 'phone' (grep -nri phone splunkd.log) to find out what messages the phonehome thread is generating. The following message indicates an issue: ' 'ex. 48781:10-07-2016 07:42:42.305 -0400 INFO DC:PhonehomeThread - Attempted handshake 20300 times. Will try to re-subscribe to handshake reply'
Though working and non-working deployment clients were reportedly in the same subnet in this situation (and may be your situation), there is still the possibility that firewalls exist on the local computer itself. Try a curl call from the forwarder to port 8089 of the deployment server by running the following command: 'curl -ku admin:changeme https://servername:8089/services/deployment/server/applications' or 'iptables -L' to confirm whether connectivity is locally blocked though the port appears to be open and listening.