While digging through my Search head logs, I stumbled upon some WARN messages from the DistributedBundleReplicationManager component regarding "Asynchronous bundle replication" "took too long (longer than 10 seconds)". The knowledge bundle the Search head is currently replicating is about 250mb in size.
Doing a historical look back, I found that this error message was occurring approximately 50 times an hour for the last month or so, each message reporting that the configuration bundle was average 250mb in size. The majority of the files in the bundle that are decently large include lookup files, however, the majority of these lookup files remain static and do not change frequently or ever. It was my understanding that the Search head and search peers kept file hash records of the knowledge bundle components and only replicated the delta of that bundle. Looking at these error messages it appears that the Search head is replicating the 250mb knowledge bundle multiple times per hour.
Is this the expected behavior? I know that all of /etc/apps for the most part is being replicated as well but nothing to my knowledge is changing on a regular enough basis that would require the Search head to send the entire replication bundle 50 times an hour.
What is the threshold on blacklisting large lookup files? Do the search peers need the lookup files? I thought the lookup files were used at search time on the Search head.
For v6.5.0 and 6.5.1, a known issue...
( http://docs.splunk.com/Documentation/Splunk/6.5.1/ReleaseNotes/Knownissues )
2016-12-03 SPL-133450, SPL-134084, SPL-134083, SPL-134427 6.5+ splunk does full bundle replication everytime - slowing down the system
By the way, as far as I understand, the current design of bundle replication does not do differential delta or incremental delta for bundle replication.
The bug is that there is no delta in v6.5.0 and v6.5.1
Thanks for the info. We haven't quite made it to 6.5 and are still sitting on a 6.3 deployment. It's interesting to see that this is a bug in the newer releases.
According to the Splunk Docs in a distributed search environment the Search Head should be distributing only the knowledge bundle delta. "...Splunk Enterprise uses delta-based replication to keep the bundle compact, with the search head usually only replicating the changed portion of the bundle to its search peers."
I review debug logs for bundle replication on and off.
This full and delta behavior was introduced in v4.2 (yes, it is old design), and never changed the basic idea except for fixing bugs as far as I remember. So, I'm positive with what I see in the logs in general.
I will file a doc review request with dev. team.
You can enable DEBUG for DistributedBundleReplicationManager component in splunkd.log for further debugging. Potentially full bundle is taking too long time or not matching over all peers and always some peers requires full bundle downloading.
I'm wrong about frequency of full bundle and delta in the present behavior.
I quickly tested in v6.4.5 with every min lookup table update. For 15 min, only delta were sent according to metrics.log