Deployment Architecture

Why is cluster master stuck at "Bundle validation is in progress" indefinitely after configuration-bundle update?

nwales
Path Finder

As above, I kicked off an update and the cluster master is stuck at "Bundle validation is in progress" and has been for several hours now.

If I restart the splunk service and try again it does the same thing. I've made minor changes to the configuration to push it through but it makes no difference.

Labels (1)
1 Solution

cam343
Path Finder

In my situation I believe the cluster was 'out of sync' (my words not Splunks) due to a bad config being applied and restarting the cluster-master while attempting bundle validation etc.

I performed the following steps to resolve the issue:

  • On the cluster-manager run: /opt/splunk/bin/splunk show cluster-bundle-status - take note of the all the bundle-id's

  • On the cluster-manager edit something in your bundle so that it gets a new checksum (eg. add a comment to a file)

  • On the cluster-manager run: /opt/splunk/bin/splunk apply cluster-bundle

  • On cluster-manager run: /opt/splunk/bin/splunk show cluster-bundle-status - you should see the master's latest_bundle ID change. At this point you should see: "cluster_status=Bundle validation is in progress." in the output.

This is where it gets stuck, now at this point restart the splunk service on an indexer, watch the output of /opt/splunk/bin/splunk show cluster-bundle-status - it's latest_bundle ID should also change.

  • Once the above indexer restarts, continue to restart the splunk service on the remainder of your indexers in your cluster.

  • When you restart the Splunk service on the 'final' indexer the output from: /opt/splunk/bin/splunk show cluster-bundle-status - should show all indexers to have the same latest_bundle ID

  • At this point the cluster-master should then initate rolling restarts of the cluster to apply the config. At this point the active_bundle ID and latest_bundle ID should match up.

And hopefully your problem is now fixed.

PS I also removed /opt/splunk/etc/slave-apps.old, but that probably wasn't required.

View solution in original post

dolbyjoab
Explorer

I had something similar.

From 100 indexers, 8 of them have not changed the active_bundle value to the newest one. 

I restarted them (took them offline and restarted them) that has helped to resolve this issue.

 

PS: from the cluster Master, you can run: /opt/splunk/bin/splunk show cluster-bundle-status  to get the bundle status of your peers

0 Karma

kkolla_splunk
Splunk Employee
Splunk Employee

Post Splunk 7.0 there is a simpler way to handle this. There is an endpoint that you can do a POST to, that cancels the bundle push operation, and gets the stuck cluster master out of the validate loop.

For more details, check these below links,

http://docs.splunk.com/Documentation/Splunk/7.0.0/RESTREF/RESTcluster#cluster.2Fmaster.2Fcontrol.2Fd...
http://docs.splunk.com/Documentation/Splunk/7.0.0/Indexer/Configurationbundleissues

samsplunks
Explorer

Splunk 8.x.x here.

Profiling settings did block my apply bundle command.

/opt/splunk/bin/splunk apply cluster-bundle

Encountered some errors while applying the bundle.
Cannot apply (or) validate configuration settings. Bundle validation is in progress.


/opt/splunk/bin/splunk show cluster-bundle
...
<bundle_validation_errors on master>
...

 

This command did the trick:

curl -k -u admin https://CLUSTER_MASTER_IP:8089/services/cluster/master/control/default/cancel_bundle_push -X POST

 

And I could edit and apply the bundle afterwards.

0 Karma

wweiland
Contributor

Cancelling the bundle push didn't actually work for me. I had to restart (./splunk restart) the indexer peers one at a time. Rolling restart from the CM won't work either.

0 Karma

siddesh333
New Member

From - https://docs.splunk.com/Documentation/Splunk/6.4.4/Indexer/Configurationbundleissues
I added below contents and commented in INDEXER cluster master's server.conf

[sslConfig]
allowSslCompression = false

[clustering]
heartbeat_timeout = 600

and commented "pass4SymmKey"

RESTARTED splunk service, then apply cluster bundle
hope it helps.

0 Karma

cam343
Path Finder

In my situation I believe the cluster was 'out of sync' (my words not Splunks) due to a bad config being applied and restarting the cluster-master while attempting bundle validation etc.

I performed the following steps to resolve the issue:

  • On the cluster-manager run: /opt/splunk/bin/splunk show cluster-bundle-status - take note of the all the bundle-id's

  • On the cluster-manager edit something in your bundle so that it gets a new checksum (eg. add a comment to a file)

  • On the cluster-manager run: /opt/splunk/bin/splunk apply cluster-bundle

  • On cluster-manager run: /opt/splunk/bin/splunk show cluster-bundle-status - you should see the master's latest_bundle ID change. At this point you should see: "cluster_status=Bundle validation is in progress." in the output.

This is where it gets stuck, now at this point restart the splunk service on an indexer, watch the output of /opt/splunk/bin/splunk show cluster-bundle-status - it's latest_bundle ID should also change.

  • Once the above indexer restarts, continue to restart the splunk service on the remainder of your indexers in your cluster.

  • When you restart the Splunk service on the 'final' indexer the output from: /opt/splunk/bin/splunk show cluster-bundle-status - should show all indexers to have the same latest_bundle ID

  • At this point the cluster-master should then initate rolling restarts of the cluster to apply the config. At this point the active_bundle ID and latest_bundle ID should match up.

And hopefully your problem is now fixed.

PS I also removed /opt/splunk/etc/slave-apps.old, but that probably wasn't required.

willsy
Communicator

You are a scholar and a gent, 7 years on and what you put is still being helpful. 

Thank you so much

0 Karma

wweiland
Contributor

Still works 4.5 years later

BainM
Communicator

still works 5.5 years later as well!

willsy
Communicator

still works 7 years later. 

champion

0 Karma

nwales
Path Finder

To confirm, the answer for me was restarting all the indexers while in maintenance mode. This as predicted by cam343 kicked off a rolling restart and moved all the indexers to the latest bundle version.

0 Karma

sansay
Contributor

In my case, I had to also restart the master.
Until I did that, it kept showing me the Unsuccessful deployment and refuse to deploy the bundle.

mahamed_splunk
Splunk Employee
Splunk Employee

Did the log files on either master or peers have any error messages?

0 Karma
Get Updates on the Splunk Community!

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

You’ve probably heard the latest about AppDynamics joining the Splunk Observability portfolio, deepening our ...

Monitoring Amazon Elastic Kubernetes Service (EKS)

As we’ve seen, integrating Kubernetes environments with Splunk Observability Cloud is a quick and easy way to ...

Cloud Platform & Enterprise: Classic Dashboard Export Feature Deprecation

As of Splunk Cloud Platform 9.3.2408 and Splunk Enterprise 9.4, classic dashboard export features are now ...