We have upgraded from version 8.1.6 to version 9.0.1 recently and have discovered a new problem not seen before. Each time I push the apps from the deployer to the SHC all apps get pushed regardless if they are old, modified or new. This results in each apply shcluster bundle takes 5-6 hours to complete. Before the upgrade a push took a couple of minutes.
Each push also generates a default.old.<DATE> for all apps.
Is there a way to remove this behaviour? We prefer not to have a lot of default.old.<DATA> files and that the pushes become much faster.
Did anyone get any solution for this? Or is there any hot fix available to resolve this? Creating lot of issue as bundle size is quite high each time it is getting pushed.
Current Splunk version is 8.2.3. Upgrading now is bit difficult due to other dependencies.
@ankitarath2011 , the issue discussed in this post occurs in Splunk 9.0.0 and 9.0.1. It does not occur in 8.x versions nor in versions from 9.0.2 onward.
How is your environment currently configured? See https://docs.splunk.com/Documentation/Splunk/8.2.3/DistSearch/PropagateSHCconfigurationchanges.
What deployer push mode are you using, and where are you setting this value in app.conf? Within an app on the search heads, or in $SPLUNK_HOME/etc/system/local/app.conf on the Deployer?
Our push mode is merge_to_default. All configurations are in $SPLUNK_HOME/etc/system/local/app.conf in the Deployer.
In the search page, getting error as "ReplicationStatus: Failed-Failure info: failed_because_BUNDLE_SIZE_RETRIEVAL_FAILURE"
In the message box got message saying bundle size exceeds limit. On checking, could see all apps in $SPLUNK_HOME/var/run/splunk/deploy even if we had changed a single file.
Please help on this.
Also want to know if any document is there to know the impact of increasing the maxBundleSize on system resources and performance (Splunk/System). If we are increasing, then how much infra needs to be increased in case it is needed.
Apologies for the delay, as I have been out of the office.
The issue you are reporting is very different than what is discussed on the current post, and needs a new thread. Could you rephrase your full question in a new post, and tag me in it? I tried to start a new one for you that we could continue on but I'm not certain the full context of your issue and question.
Regarding increasing maxBundleSize, it is normally a better practice to manage bundle sizes using the [replicationWhitelist] or [replicationBlacklist] stanzas in distsearch.conf. Raising bundle size limits or raising bundle replication timeouts can cause bundles to take longer to reach your indexers. By default, Search Heads use knowledge bundles to send nearly the entire contents of all of their apps to the indexers. If an app contains large binaries that do not need to be shared with the indexers, reduce the size of the bundle by whitelisting or blacklisting particular files or types of files.
The short term solution we went with was to update the deployer_lookups_push_mode to not update the lookups as part of the push. Then, any time we needed to update the lookups, we leveraged the lookup editor app's API endpoint to make our updates (/services/data/lookup_edit/lookup_contents) as part of our ci/cd pipeline which we use to push updates to the cluster. There are example scripts posted online you should be able to find pretty easily in order to do this.
It's not the most elegant solution, but it's getting us through until we can upgrade 🙂
Thank you for your response. However, we are using preserve lookup while executing the apply bundle command. So, in that case will change in deployer_lookups_push_mode help fix it? Is it not same as using -preserve-lookup true in the apply bundle command?
Also want to know what value you have for deployer_lookups_push_mode?
Your issue may have to do with changes to deployer_lookups_push_mode setting in app.conf. See explanation in 9.0.2 README:
In 9.0, a change was made to the behavior of the default preserve_lookups value for the deployer_lookups_push_mode setting in app.conf. This change fixed a behavior issue that caused preserve_lookups to not conform to its documented and intended behavior. Rather, prior to 9.0.0, the preserve_lookups value conformed instead to the documented behavior of the always_preserve value.
However, although the 9.0.0 change fixed the behavior of the preserve_lookups value, the change also led to performance degradation when using that value because of additional processing needed to attain the intended result. In addition, the fix changed the default behavior of the deployer_lookups_push_mode setting, which introduced an additional problem, since some users had come to expect and rely on the pre-9.0.0 behavior of the default preserve_lookups value, buggy though it was,
To counteract the resultant performance degradation and change to expected default behavior introduced by the 9.0.0 change to the default value, in 9.0.2, the default value for the deployer_lookups_push_mode setting was changed to always_preserve. This change to the default causes the default behavior of the setting to conform to the unfixed behavior of the preserve_lookups value prior to the change in 9.0.0
Documentation has also been upgraded for default behavior of the deployer_lookups_push_mode setting in app.conf for 9.0.2. See: https://docs.splunk.com/Documentation/Splunk/9.0.2/DistSearch/PropagateSHCconfigurationchanges#Prese...
If this is your issue, options may be:
[shclustering] deployer_lookups_push_mode = always_preserve
[shclustering] deployer_lookups_push_mode = preserve_lookups
I hope this helps!
This behavior definitely still exists - wasn't fixed in 9.0.1. We went from 8.2.4 to 9.0.3 and deployments are taking a lot longer as the Deployer is pushing all the apps, no matter if they've been updated or not