Deployment Architecture

multiple deployment servers - checksum mismatch among instances of apps

dstaulcu
Builder

we have 3 deployment servers (DS1,DS2,DS3) in our splunk instance. DS2 and DS3 are deployment clients of the deployment server on DS1 to provide for synchronization of deployment-apps.

From a file-system perspective it appears that the content in deployment-apps is identical across DS1,DS2 and DS3. That said, when we rotate universal forwarder deployment-clients among DS1,DS2, DS3, UF deployment clients detect a checksum mismatch when checking for app-update and downloads the apps all over again. This occurs each time the client rotates to another DS.

So I guess the question is.. what factors go into the checksum calculation? Is it more than just an MD5 of the app folder? does the checksum take into account any other factors such as creation dates or host specific salt? I don't understanad why we are seeing different checksums for like-content among these DS instances

Input appreciated.

1 Solution

sciurus
Path Finder

The checksum is an implementation detail (and therefore subject to change), however it appears to be the top half of an MD5 hash of the bundle file ($SPLUNK_HOME/var/run/*/*.bundle)

12-05-2013 18:43:01.040 +1100 INFO DeployedApplication - Checksum mismatch 16438968627284953542 <> 14029372552826739163 for blah blah blah

14029372552826739163 = C2B2580188DA59DB

# md5sum deployment-client-1386229056.bundle

c2b2580188da59db6c3a817dc763a82e deployment-client-1386229056.bundle

The bundle itself is a tarball of the app directory, and it's generated when you start Splunk, or run "splunk reload deploy-server".

So to answer your question, it takes in to account whatever tar uses, which should include timestamps, permissions, etc, but not host specific information.

I haven't attempted it myself, but you should be able to push an app from DS1 to the deployment-apps directory on DS2/3 using targetRepositoryLocation (not repositoryLocation) value in serverclass.conf, and have it remain identical. Unfortunately it won't detect the change without a restart of Splunk, or a manual "reload deploy-server".

View solution in original post

Jason
Motivator

As of Splunk 6.3 and later, there is now an optional attribute crossServerChecksum in serverclass.conf.

Default is false - the old way - so your upgrade to 6.3.x+ doesn't immediately resend all apps to all clients. But if set to true it uses a different algorithm (which does not include timestamps and such) to make a DS react the same as another - so they can sit behind a load balancer.

zabrahamson
New Member

Bumping this in case anyone found a solution. We have two Deployment Server instances behind a LB an crossServerChecksum set to true on both, yet are still seeing perpetual app downloads by the clients connecting.

0 Karma

Jason
Motivator

This is the situation crossServerChecksum should fix.

0 Karma

sciurus
Path Finder

The checksum is an implementation detail (and therefore subject to change), however it appears to be the top half of an MD5 hash of the bundle file ($SPLUNK_HOME/var/run/*/*.bundle)

12-05-2013 18:43:01.040 +1100 INFO DeployedApplication - Checksum mismatch 16438968627284953542 <> 14029372552826739163 for blah blah blah

14029372552826739163 = C2B2580188DA59DB

# md5sum deployment-client-1386229056.bundle

c2b2580188da59db6c3a817dc763a82e deployment-client-1386229056.bundle

The bundle itself is a tarball of the app directory, and it's generated when you start Splunk, or run "splunk reload deploy-server".

So to answer your question, it takes in to account whatever tar uses, which should include timestamps, permissions, etc, but not host specific information.

I haven't attempted it myself, but you should be able to push an app from DS1 to the deployment-apps directory on DS2/3 using targetRepositoryLocation (not repositoryLocation) value in serverclass.conf, and have it remain identical. Unfortunately it won't detect the change without a restart of Splunk, or a manual "reload deploy-server".

dstaulcu
Builder

thank you! perfect answer.

we were noticing that each time our UF alternated between use of DS1, DS2, DS3 (UF's point to a load balancer which redirects them to least loaded DS) the UFs would download and restart even though we had not changed content. Now I understand why. It's because our DS1, DS2, and DS3 load their bundles at slightly different times.

edit: while altering the serverclass to download the app to the deployment-app repository is useful, it does not mitigate the differences in bundle file checksum that would result from non-synchronous reload or restart among DS instances

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...