Deployment Architecture

How to identify why cluster-bundle rolls back after configuration has been applied to part of the cluster?

nwales
Path Finder

The command runs successfully, and I can see that on a few servers in the cluster that the configuration changes have been made.

However it then stops, rolls back to the previous configuration, and restarts each instance. I can see that the checksum throughout is the previous.

Running show cluster-bundle status returns the below, but I can't work out how I can see where the issues are and clearly the validation is working just fine on the master otherwise it wouldn't push out the configuration in the first place.

master
active_bundle=5ACC4E37227401B8B0A6FACC44AD2E06
latest_bundle=5ACC4E37227401B8B0A6FACC44AD2E06
invalid_bundle
bundle_path:/opt/splunk/var/run/splunk/cluster/remote-bundle/d6323d1284cbf82ac055a1fbdfa44483-1408611584.bundle
bundle_validation_errors_on_master:
checksum:5ACC4E37227401B8B0A6FACC44AD2E06
timestamp:1408611584
reload=1
cluster_status=Issued bundleReload to the peers. Waiting for all peers to return the status.

dbhagi_splunk
Splunk Employee
Splunk Employee

When you apply new bundle at the cluster master, Master first validates it locally & then sets its latest bundle id to new bundle checksum (if master validation succeeds). Now peers download the latest bundle & validate it individually. If one or more peers report any validation error(s), then cluster master will roll back to old bundle and then all the peers will continue with old bundle. But in this case, master won't issue a rolling restart of peers and we will log below error message in the Master's splunkd.log.

"Cannot continue with rolling restart. Found bundle validation errors." Followed by error messages from peers.

Peers also log their bundle validation errors to their respective splunkd.logs. You will find something like: "Informed bundle error status to the master for bundle id=" in the peer's splunkd.log.

Please look for any validation errors from peers & at peers. I can shed some more light on what might have gone wrong if i get to see Master's splunkd.log & slave's splunkd.log (where validation failed, if we confirm thats what happened). Hope this helps.

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Design, Compete, Win: Submit Your Best Splunk Dashboards for a .conf26 Pass

Hello Splunkers,  We’re excited to kick off a Splunk Dashboard contest! We know that dashboards are a primary ...

May 2026 Splunk Expert Sessions: Security & Observability

Level Up Your Operations: May 2026 Splunk Expert Sessions Whether you are refining your security posture or ...

Network to App: Observability Unlocked [May & June Series]

In today’s digital landscape, your environment is no longer confined to the data center. It spans complex ...