Does the bundle replication happen every time there is a change the bundle?, If the CRC changes, is the latest bundle replicated too all of the search peers? or only the .delta file?
What exactly is the purpose of the .delta bundle?
Yes, it is replicated each time there is a change and a new search has to run. The delta bundle is pushed simply so that the entire bundle doesn't have to be copied over the network, only the changed portions.
Err, a long time ago we pushed the knowledge bundle changes on search start. Currently we push it on a combination of two timers. Every 60 seconds we ask the peer for its current state, and every 60 seconds we update local state and compare it against the idea of the peer states and push the updated bundle if needed.
Thus, ultimately, by default, there is a delay between 0 and 120 seconds between a change and when the updated data is pushed, regardless of whether any searches are run against the peer. Since these timers are not coordinated specifically, and can drift due to their work, the exact delay may vary. Due to probabilities, 60 seconds is more likely than 0 or 120.
This is helpful. Thank you!
Would Splunk not need to copy the entire bundle (after a change) to determine the delta of two bundles?
because both the old and new versions existed on the search head already and therefore can compute the delta there instead of copying a large file for a small change.
Thank you for the explanation. I am trying to find a explanation in the documentation but it's vague here.
It seems like it would be the opposite, it would be need to transfer the entire contents if the CRC changed and then extract the delta after it has copies of both files. I wish the admin guide is more clear on this.
The process of distributing knowledge bundles means that indexers by default receive nearly the entire contents of all the search head's apps. If an app contains large binaries that do not need to be shared with the indexers, you can reduce the size of the bundle by means of the [replicationWhitelist] or [replicationBlacklist] stanza in distsearch.conf. See "Limit knowledge bundle size" in this manual.
In case it isn't clear, we have a hash of each of the files in the set of potential files to transfer, and we ask the peer for its set of hashes for the data that it has for the search head name.
if they all match, there is no need to transfer anything. If some do not match, then we send the entire contents of files which do not match as part of one delta bundle. For purposes of generating the hash there are a few tricks to eliminate content which doesn't matter, like comment lines, or stanzas that can never matter on a search peer.
When we actually push the files, I believe we send the entire original form.