Deployment Architecture

Why is cluster master stuck at "Bundle validation is in progress" indefinitely after configuration-bundle update?

nwales
Path Finder

As above, I kicked off an update and the cluster master is stuck at "Bundle validation is in progress" and has been for several hours now.

If I restart the splunk service and try again it does the same thing. I've made minor changes to the configuration to push it through but it makes no difference.

Labels (1)
1 Solution

cam343
Path Finder

In my situation I believe the cluster was 'out of sync' (my words not Splunks) due to a bad config being applied and restarting the cluster-master while attempting bundle validation etc.

I performed the following steps to resolve the issue:

  • On the cluster-manager run: /opt/splunk/bin/splunk show cluster-bundle-status - take note of the all the bundle-id's

  • On the cluster-manager edit something in your bundle so that it gets a new checksum (eg. add a comment to a file)

  • On the cluster-manager run: /opt/splunk/bin/splunk apply cluster-bundle

  • On cluster-manager run: /opt/splunk/bin/splunk show cluster-bundle-status - you should see the master's latest_bundle ID change. At this point you should see: "cluster_status=Bundle validation is in progress." in the output.

This is where it gets stuck, now at this point restart the splunk service on an indexer, watch the output of /opt/splunk/bin/splunk show cluster-bundle-status - it's latest_bundle ID should also change.

  • Once the above indexer restarts, continue to restart the splunk service on the remainder of your indexers in your cluster.

  • When you restart the Splunk service on the 'final' indexer the output from: /opt/splunk/bin/splunk show cluster-bundle-status - should show all indexers to have the same latest_bundle ID

  • At this point the cluster-master should then initate rolling restarts of the cluster to apply the config. At this point the active_bundle ID and latest_bundle ID should match up.

And hopefully your problem is now fixed.

PS I also removed /opt/splunk/etc/slave-apps.old, but that probably wasn't required.

View solution in original post

dolbyjoab
Explorer

I had something similar.

From 100 indexers, 8 of them have not changed the active_bundle value to the newest one. 

I restarted them (took them offline and restarted them) that has helped to resolve this issue.

 

PS: from the cluster Master, you can run: /opt/splunk/bin/splunk show cluster-bundle-status  to get the bundle status of your peers

0 Karma

kkolla_splunk
Splunk Employee
Splunk Employee

Post Splunk 7.0 there is a simpler way to handle this. There is an endpoint that you can do a POST to, that cancels the bundle push operation, and gets the stuck cluster master out of the validate loop.

For more details, check these below links,

http://docs.splunk.com/Documentation/Splunk/7.0.0/RESTREF/RESTcluster#cluster.2Fmaster.2Fcontrol.2Fd...
http://docs.splunk.com/Documentation/Splunk/7.0.0/Indexer/Configurationbundleissues

samsplunks
Explorer

Splunk 8.x.x here.

Profiling settings did block my apply bundle command.

/opt/splunk/bin/splunk apply cluster-bundle

Encountered some errors while applying the bundle.
Cannot apply (or) validate configuration settings. Bundle validation is in progress.


/opt/splunk/bin/splunk show cluster-bundle
...
<bundle_validation_errors on master>
...

 

This command did the trick:

curl -k -u admin https://CLUSTER_MASTER_IP:8089/services/cluster/master/control/default/cancel_bundle_push -X POST

 

And I could edit and apply the bundle afterwards.

0 Karma

wweiland
Contributor

Cancelling the bundle push didn't actually work for me. I had to restart (./splunk restart) the indexer peers one at a time. Rolling restart from the CM won't work either.

0 Karma

siddesh333
New Member

From - https://docs.splunk.com/Documentation/Splunk/6.4.4/Indexer/Configurationbundleissues
I added below contents and commented in INDEXER cluster master's server.conf

[sslConfig]
allowSslCompression = false

[clustering]
heartbeat_timeout = 600

and commented "pass4SymmKey"

RESTARTED splunk service, then apply cluster bundle
hope it helps.

0 Karma

cam343
Path Finder

In my situation I believe the cluster was 'out of sync' (my words not Splunks) due to a bad config being applied and restarting the cluster-master while attempting bundle validation etc.

I performed the following steps to resolve the issue:

  • On the cluster-manager run: /opt/splunk/bin/splunk show cluster-bundle-status - take note of the all the bundle-id's

  • On the cluster-manager edit something in your bundle so that it gets a new checksum (eg. add a comment to a file)

  • On the cluster-manager run: /opt/splunk/bin/splunk apply cluster-bundle

  • On cluster-manager run: /opt/splunk/bin/splunk show cluster-bundle-status - you should see the master's latest_bundle ID change. At this point you should see: "cluster_status=Bundle validation is in progress." in the output.

This is where it gets stuck, now at this point restart the splunk service on an indexer, watch the output of /opt/splunk/bin/splunk show cluster-bundle-status - it's latest_bundle ID should also change.

  • Once the above indexer restarts, continue to restart the splunk service on the remainder of your indexers in your cluster.

  • When you restart the Splunk service on the 'final' indexer the output from: /opt/splunk/bin/splunk show cluster-bundle-status - should show all indexers to have the same latest_bundle ID

  • At this point the cluster-master should then initate rolling restarts of the cluster to apply the config. At this point the active_bundle ID and latest_bundle ID should match up.

And hopefully your problem is now fixed.

PS I also removed /opt/splunk/etc/slave-apps.old, but that probably wasn't required.

willsy
Communicator

You are a scholar and a gent, 7 years on and what you put is still being helpful. 

Thank you so much

0 Karma

wweiland
Contributor

Still works 4.5 years later

BainM
Communicator

still works 5.5 years later as well!

willsy
Communicator

still works 7 years later. 

champion

0 Karma

nwales
Path Finder

To confirm, the answer for me was restarting all the indexers while in maintenance mode. This as predicted by cam343 kicked off a rolling restart and moved all the indexers to the latest bundle version.

0 Karma

sansay
Contributor

In my case, I had to also restart the master.
Until I did that, it kept showing me the Unsuccessful deployment and refuse to deploy the bundle.

mahamed_splunk
Splunk Employee
Splunk Employee

Did the log files on either master or peers have any error messages?

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...