Deployment Server Connection Errors

chris94089 · ‎09-25-2020

Greetings,

I'm hosting a Splunk Deployment Server in a Kubernetes environment. When I'm using one replica, connections are smooth. But, when I use two or more, I get this error:

PubSubSvr - sender=connection_<my_client> channel=tenantService/handshake Message not dispatched (connection invalid)

The Server receives every phone home, but it doesn't do anything else.

What are some of the causes for this?

isoutamo · ‎09-25-2020

Hi
If I understood right you try to use several DS? If so then check crossServerChecksum value on https://docs.splunk.com/Documentation/Splunk/7.3.3/Admin/Serverclassconf it must be true.
r. Ismo

chris94089 · ‎09-28-2020

This response would be appropriate for situations like the deployment server constantly re-deploys its apps to clients over and over again. Cross server checksum tells splunk to ignore the timestamp mismatch (I think, it's not really documented what exactly Cross server checksum does).

That is not the case here.

The behavior I am asking about has to do with Splunk thinking that certain connections are invalid when splunk is placed behind a load balancer. Even though curl works, I can access splunk web, etc.

Are there other settings one should configure for a multi Deployment Server?

ips_mandar · ‎04-22-2021

Hi @chris94089 , are you able to resolve these errors? Can you provide some details since I am also getting same errors on deployment servers.

Thanks,

chris94089 · ‎04-22-2021

The only work around I was able to use involved spinning up a custom nginx load balancer instance and using the hash addr algorithm.

I tried searching for the link for how to use hash_addr but I’m not finding it. It may have been updated. I’m no longer on the project and I can’t follow up if my specific work around is still working.

The purpose of using hashing is because the same pair of DS and DC need at least two (or three) back to back phone homes to successfully enroll a new DC. After that a DC can get its updates from other DS’s placed behind a load balancer.

The behavior goes like this, a new deployment client (DC) phones home and passes through a load balancer and hits deployment server 1 (DS1). Then it phones home again, but if load balancer routes to deployment server 2 (DC2) you’ll get an invalid error. Then the enrollment will just loop again. This behavior is for round robin algorithms.

Phone homes don’t use sessions, so cookies won’t work either. So my approach was have the load balancer “remember” who was phoning home to who some other way. This seems only necessary during the initial enrollment attempts. After the DC has its apps it doesn’t care which DS it connects to for app updates.

isoutamo · ‎04-22-2021

Have you seen this https://conf.splunk.com/files/2019/slides/FN2048.pdf ?

Deployment Server Connection Errors

deployment client

deployment server

Can’t make it to .conf25? Join us online!

Take Action Automatically on Splunk Alerts with Red Hat Ansible Automation Platform

Calling All Security Pros: Ready to Race Through Boston?

Beyond Detection: How Splunk and Cisco Integrated Security Platforms Transform ...

Are you a member of the Splunk Community?

Deployment Server Connection Errors

deployment client

deployment server

Can’t make it to .conf25? Join us online!

Take Action Automatically on Splunk Alerts with Red Hat Ansible Automation Platform

Calling All Security Pros: Ready to Race Through Boston?

Beyond Detection: How Splunk and Cisco Integrated Security Platforms Transform ...