Deployment Architecture

Why is cluster master stuck at "Bundle validation is in progress" indefinitely after configuration-bundle update?

Path Finder

As above, I kicked off an update and the cluster master is stuck at "Bundle validation is in progress" and has been for several hours now.

If I restart the splunk service and try again it does the same thing. I've made minor changes to the configuration to push it through but it makes no difference.

Labels (1)
1 Solution

Path Finder

In my situation I believe the cluster was 'out of sync' (my words not Splunks) due to a bad config being applied and restarting the cluster-master while attempting bundle validation etc.

I performed the following steps to resolve the issue:

  • On the cluster-manager run: /opt/splunk/bin/splunk show cluster-bundle-status - take note of the all the bundle-id's

  • On the cluster-manager edit something in your bundle so that it gets a new checksum (eg. add a comment to a file)

  • On the cluster-manager run: /opt/splunk/bin/splunk apply cluster-bundle

  • On cluster-manager run: /opt/splunk/bin/splunk show cluster-bundle-status - you should see the master's latest_bundle ID change. At this point you should see: "cluster_status=Bundle validation is in progress." in the output.

This is where it gets stuck, now at this point restart the splunk service on an indexer, watch the output of /opt/splunk/bin/splunk show cluster-bundle-status - it's latest_bundle ID should also change.

  • Once the above indexer restarts, continue to restart the splunk service on the remainder of your indexers in your cluster.

  • When you restart the Splunk service on the 'final' indexer the output from: /opt/splunk/bin/splunk show cluster-bundle-status - should show all indexers to have the same latest_bundle ID

  • At this point the cluster-master should then initate rolling restarts of the cluster to apply the config. At this point the active_bundle ID and latest_bundle ID should match up.

And hopefully your problem is now fixed.

PS I also removed /opt/splunk/etc/slave-apps.old, but that probably wasn't required.

View solution in original post

Explorer

I had something similar.

From 100 indexers, 8 of them have not changed the active_bundle value to the newest one. 

I restarted them (took them offline and restarted them) that has helped to resolve this issue.

 

PS: from the cluster Master, you can run: /opt/splunk/bin/splunk show cluster-bundle-status  to get the bundle status of your peers

0 Karma

Splunk Employee
Splunk Employee

Post Splunk 7.0 there is a simpler way to handle this. There is an endpoint that you can do a POST to, that cancels the bundle push operation, and gets the stuck cluster master out of the validate loop.

For more details, check these below links,

http://docs.splunk.com/Documentation/Splunk/7.0.0/RESTREF/RESTcluster#cluster.2Fmaster.2Fcontrol.2Fd...
http://docs.splunk.com/Documentation/Splunk/7.0.0/Indexer/Configurationbundleissues

Contributor

Cancelling the bundle push didn't actually work for me. I had to restart (./splunk restart) the indexer peers one at a time. Rolling restart from the CM won't work either.

0 Karma

New Member

From - https://docs.splunk.com/Documentation/Splunk/6.4.4/Indexer/Configurationbundleissues
I added below contents and commented in INDEXER cluster master's server.conf

[sslConfig]
allowSslCompression = false

[clustering]
heartbeat_timeout = 600

and commented "pass4SymmKey"

RESTARTED splunk service, then apply cluster bundle
hope it helps.

0 Karma

Path Finder

In my situation I believe the cluster was 'out of sync' (my words not Splunks) due to a bad config being applied and restarting the cluster-master while attempting bundle validation etc.

I performed the following steps to resolve the issue:

  • On the cluster-manager run: /opt/splunk/bin/splunk show cluster-bundle-status - take note of the all the bundle-id's

  • On the cluster-manager edit something in your bundle so that it gets a new checksum (eg. add a comment to a file)

  • On the cluster-manager run: /opt/splunk/bin/splunk apply cluster-bundle

  • On cluster-manager run: /opt/splunk/bin/splunk show cluster-bundle-status - you should see the master's latest_bundle ID change. At this point you should see: "cluster_status=Bundle validation is in progress." in the output.

This is where it gets stuck, now at this point restart the splunk service on an indexer, watch the output of /opt/splunk/bin/splunk show cluster-bundle-status - it's latest_bundle ID should also change.

  • Once the above indexer restarts, continue to restart the splunk service on the remainder of your indexers in your cluster.

  • When you restart the Splunk service on the 'final' indexer the output from: /opt/splunk/bin/splunk show cluster-bundle-status - should show all indexers to have the same latest_bundle ID

  • At this point the cluster-master should then initate rolling restarts of the cluster to apply the config. At this point the active_bundle ID and latest_bundle ID should match up.

And hopefully your problem is now fixed.

PS I also removed /opt/splunk/etc/slave-apps.old, but that probably wasn't required.

View solution in original post

Contributor

Still works 4.5 years later

Communicator

still works 5.5 years later as well!

Path Finder

To confirm, the answer for me was restarting all the indexers while in maintenance mode. This as predicted by cam343 kicked off a rolling restart and moved all the indexers to the latest bundle version.

0 Karma

Contributor

In my case, I had to also restart the master.
Until I did that, it kept showing me the Unsuccessful deployment and refuse to deploy the bundle.

Splunk Employee
Splunk Employee

Did the log files on either master or peers have any error messages?

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!