Hi everyone. I have just deployed well over 300 forwarders over the past 48 hours. Our serverclass.conf file is broken out so that the highest stanza defines an open serverclass (whitelist.0 = *) and then an app stanza that assigns an app called Base_DeploymentClient to all servers. This app contains deploymentserver.conf.
This app has worked on every single server EXCEPT two. Nothing is different for these two servers. They are not special in any way. When you start up splunk, the following error pops up in splunkd.log:
11-07-2013 03:52:50.979 +0000 INFO DeployedApplication - Refreshed app: Base_DeploymentClient for service class: all_clients from archive: /opt/splunkforwarder/var/run/all_clients/Base_DeploymentClient-1383757716.bundle
11-07-2013 03:52:50.984 +0000 WARN DeploymentClient - Phonehome thread is now started.
11-07-2013 03:52:50.984 +0000 WARN DeploymentClient - Unable to send handshake message to deployment server. Error status is: not_connected
After this, there are no further messages that appear referencing the deploymentclient process or the deployment server, which is unusual compared to our other boxes which do mention the phoning home. I have modified the other apps slightly in the $SPLUNK_HOME/etc/apps directory so that they will have a different checksum, but nothing seems to get the boxes to phone home to the deployment server. For all purposes, the deployment process doesn't seem to be running.
I have verified multiple times that they are able to communicate to the Deployment Server and aren't blocked by any ACL/firewall, so I don't know why the handshake fails immediately after startup.
Running list deploy-clients on the deployment server doesn't show the servers have ever connected.
At this point, I am stumped. Any ideas on how to troubleshoot this? Thanks!
Have you setup a deployment client on another box, your desktop will work. Just set the forwarders to phone home to your desktop/laptop for configs. If that doesn't work, I would say there is more than likely a problem with the forwarders. If it does work, then you will want to run splunk diag on the deployment server and both forwarders and submit a ticket to splunk explaining this. A deployment server should be able to handle about 2500 clients, depending on the amount of CPU's and RAM you have.