Deployment Architecture

Why is the search head distributing entire knowledge bundle 50+ times an hour and doesn't appear to be sending .delta files instead?

RJ_Grayson
Path Finder

While digging through my Search head logs, I stumbled upon some WARN messages from the DistributedBundleReplicationManager component regarding "Asynchronous bundle replication" "took too long (longer than 10 seconds)". The knowledge bundle the Search head is currently replicating is about 250mb in size.

Doing a historical look back, I found that this error message was occurring approximately 50 times an hour for the last month or so, each message reporting that the configuration bundle was average 250mb in size. The majority of the files in the bundle that are decently large include lookup files, however, the majority of these lookup files remain static and do not change frequently or ever. It was my understanding that the Search head and search peers kept file hash records of the knowledge bundle components and only replicated the delta of that bundle. Looking at these error messages it appears that the Search head is replicating the 250mb knowledge bundle multiple times per hour.

Is this the expected behavior? I know that all of /etc/apps for the most part is being replicated as well but nothing to my knowledge is changing on a regular enough basis that would require the Search head to send the entire replication bundle 50 times an hour.

What is the threshold on blacklisting large lookup files? Do the search peers need the lookup files? I thought the lookup files were used at search time on the Search head.

0 Karma
1 Solution

mirkoneverstops
Path Finder
0 Karma

mirkoneverstops
Path Finder

Fixed in 6.5.2 as mentioned in "Release Notes": https://docs.splunk.com/Documentation/Splunk/6.5.2/ReleaseNotes/6.5.2

0 Karma

Masa
Splunk Employee
Splunk Employee

For v6.5.0 and 6.5.1, a known issue...
( http://docs.splunk.com/Documentation/Splunk/6.5.1/ReleaseNotes/Knownissues )
2016-12-03 SPL-133450, SPL-134084, SPL-134083, SPL-134427 6.5+ splunk does full bundle replication everytime - slowing down the system

By the way, as far as I understand, the current design of bundle replication does not do differential delta or incremental delta for bundle replication.

  1. Full bundle
  2. Delta (modified files) from the previous full bundle
  3. Full bundle
  4. Delta (modified files) from the previous full bundle
  5. repeats full and delta...

The bug is that there is no delta in v6.5.0 and v6.5.1

0 Karma

RJ_Grayson
Path Finder

Masa,

Thanks for the info. We haven't quite made it to 6.5 and are still sitting on a 6.3 deployment. It's interesting to see that this is a bug in the newer releases.

According to the Splunk Docs in a distributed search environment the Search Head should be distributing only the knowledge bundle delta. "...Splunk Enterprise uses delta-based replication to keep the bundle compact, with the search head usually only replicating the changed portion of the bundle to its search peers."
docs.splunk.com/Documentation/Splunk/6.5.1/DistSearch/Limittheknowledgebundlesize

0 Karma

Masa
Splunk Employee
Splunk Employee

I review debug logs for bundle replication on and off.
This full and delta behavior was introduced in v4.2 (yes, it is old design), and never changed the basic idea except for fixing bugs as far as I remember. So, I'm positive with what I see in the logs in general.

I will file a doc review request with dev. team.

You can enable DEBUG for DistributedBundleReplicationManager component in splunkd.log for further debugging. Potentially full bundle is taking too long time or not matching over all peers and always some peers requires full bundle downloading.

0 Karma

Masa
Splunk Employee
Splunk Employee

I'm wrong about frequency of full bundle and delta in the present behavior.
I quickly tested in v6.4.5 with every min lookup table update. For 15 min, only delta were sent according to metrics.log

0 Karma
Get Updates on the Splunk Community!

Video | Welcome Back to Smartness, Pedro

Remember Splunk Community member, Pedro Borges? If you tuned into Episode 2 of our Smartness interview series, ...

Detector Best Practices: Static Thresholds

Introduction In observability monitoring, static thresholds are used to monitor fixed, known values within ...

Expert Tips from Splunk Education, Observability in Action, Plus More New Articles on ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...