Deployment Architecture

How to properly configure the deployment client phone home interval

mikesaia
Path Finder

I noticed when I change the phone home interval from 30 or 60 seconds to 300 seconds for example. The deployment server logs messages for each host "dsevent=connected" and then after approximately a minute it logs "dsevent=connection_lost". The other symptom is that when the connection is lost the hosts no longer show up on the deployment servers' web interface. You can't browse which apps are being deployed to the host until it "reconnects" again.

I would like to have the clients only check in every 5 or 10 minutes. The problem is the way this is currently working the clients are "disconnecting" and then "reconnecting" so the web page is never an accurate report of what is being deployed to who.

This is affecting Windows and Linux Universal Forwarders.

lguinn2
Legend

Since the forwarder is making a TCP connection to the deployment server, TCP is going to drop the connection after a short time if the connection is idle. I suspect that you could change your TCP settings on the forwarder to keep the TCP connection alive.

Another approach would be to write your own search or dashboard to get the status of the deployment clients. You might start with this:

index="_internal" sourcetype="splunkd" component="DeploymentMetrics" | 
rename scName as serverClass fqname as install_location hostname as deploymentClient | 
table _time deploymentClient ip serverClass appName event status reason install_location
0 Karma

mikesaia
Path Finder

I'm going to try this out and get back to you. Thanks for the thought.

0 Karma

lguinn2
Legend

I see what you mean. My answer is not a good one, then. I've edited my answer with a fresh idea.

0 Karma

mikesaia
Path Finder

I think that would work but is a bit impractical. The Splunk deployment server should know that a host is set to check in at "X" interval. Then keep track of whether or not it has had a check in from the deployment client in "X" interval. That way it is much more dynamic. The issue here is that the splunk deployment server isn't keeping track of when hosts have checked in. Even if it did a hard limit of 15 minutes. If it hasn't seen a check in from a host in 15 minutes then remove it from the list; that is more reasonable than how it is currently being handled.

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...