I noticed when I change the phone home interval from 30 or 60 seconds to 300 seconds for example. The deployment server logs messages for each host "dsevent=connected" and then after approximately a minute it logs "dsevent=connection_lost". The other symptom is that when the connection is lost the hosts no longer show up on the deployment servers' web interface. You can't browse which apps are being deployed to the host until it "reconnects" again.
I would like to have the clients only check in every 5 or 10 minutes. The problem is the way this is currently working the clients are "disconnecting" and then "reconnecting" so the web page is never an accurate report of what is being deployed to who.
This is affecting Windows and Linux Universal Forwarders.
Since the forwarder is making a TCP connection to the deployment server, TCP is going to drop the connection after a short time if the connection is idle. I suspect that you could change your TCP settings on the forwarder to keep the TCP connection alive.
Another approach would be to write your own search or dashboard to get the status of the deployment clients. You might start with this:
index="_internal" sourcetype="splunkd" component="DeploymentMetrics" |
rename scName as serverClass fqname as install_location hostname as deploymentClient |
table _time deploymentClient ip serverClass appName event status reason install_location
I think that would work but is a bit impractical. The Splunk deployment server should know that a host is set to check in at "X" interval. Then keep track of whether or not it has had a check in from the deployment client in "X" interval. That way it is much more dynamic. The issue here is that the splunk deployment server isn't keeping track of when hosts have checked in. Even if it did a hard limit of 15 minutes. If it hasn't seen a check in from a host in 15 minutes then remove it from the list; that is more reasonable than how it is currently being handled.