Deployment Architecture

What does this event mean? " WARN DistributedBundleReplicationManager - bundle replication to 24 peer(s) took too long"

Communicator

I'm working to improve search performance in my 4SH SHP environment with 24 indexers. We are seeing many of these events in splunkd.log:

11-26-2012 19:39:40.745 +0000 WARN DistributedBundleReplicationManager - bundle replication to 24 peer(s) took too long (18564ms), bundle file size=13950KB, replication_id=1354804280

11-26-2012 19:42:29.211 +0000 WARN DistributedBundleReplicationManager - bundle replication to 24 peer(s) took too long (27963ms), bundle file size=13950KB, replication_id=1354804566

Both the search heads and the indexing tiers are on version 4.3.5

What exactly is bundle replication? Can I turn it off?

What is the impact of these long replication times on my searches?

1 Solution

Builder

Answers to these questions:

  • What are bundles?
    When initiating a distributed search, the search head replicates and distributes its knowledge objects to its search peers. Knowledge objects include saved searches, event types, and other entities used in searching across indexes. The search head needs to distribute this material to its search peers so that they can properly execute queries on its behalf. The set of data that the search head distributes is called the knowledge bundle.

    The indexers use the search head's knowledge bundle to execute queries on its behalf. When executing a distributed search, the indexers are ignorant of any local knowledge objects. They have access only to the objects in the search head's knowledge bundle.

    The process of distributing knowledge bundles means that indexers by default receive nearly the entire contents of all the search head's apps. If an app contains large binaries that do not need to be shared with the indexers, you can reduce the size of the bundle by means of the [replicationWhitelist] or [replicationBlacklist] stanza in distsearch.conf. See "Limit knowledge bundle size" in this manual.

    The knowledge bundle gets distributed to the $SPLUNK_HOME/var/run/searchpeers/ directory on each search peer. Because the search head distributes its knowledge, search scripts should not hardcode paths to resources. The knowledge bundle will reside at a different location on the search peer's file system, so hardcoded paths will not work properly.

    By default, the search head replicates and distributes the knowledge bundle to each search peer. For greater efficiency, you can instead tell the search peers to mount the knowledge bundle's directory location, eliminating the need for bundle replication. When you mount a knowledge bundle, it's referred to as a mounted bundle. To learn how to mount bundles, read "Mount the knowledge bundle".

What is bundle replication?

  • Essentially search heads send the required knowledge objects so that the indexers/peers can run the searches on its behalf. You can turn it off if you have mounted bundles.

What is the impact?

Bundle replication happens asynchronously from search. There is some impact on performance. However, long bundle replication times mean that the changes made in the search head will take longer to become effective, since they become effective only after the indexers have the bundles. (this is straight from one of the devs)

Mounting your knowledge bundles would help as detailed here http://docs.splunk.com/Documentation/Splunk/5.0/Deploy/Mounttheknowledgebundle

However, mounting to any type of share over WAN is going to yield very poor results and is not recommended by splunk.

And to clarify even further, a search will not be prevented from running just because knowledge replication has not finished. In the case that a search is launched from the search head mid bundle replication, the search head will tell the peers to use the last common bundle set, ie the last updated bundle they received. This prevents erroneous results and makes sure the peers use the same bundle for that search.

View solution in original post

Splunk Employee
Splunk Employee

Increasing the number of threads performing bundle replication can reduce the amount of time it takes for bundles to replicate to peers. It is safe to set replicationThreads to 1 thread per physical core on the Search Head.
On the Search Heads, in distsearch.conf (must be configured in $SPLUNK_HOME/etc/system/local):
[replicationSettings]
replicationThreads = <1 thread per SH physical core>
* The maximum number of threads to use when performing bundle replication to peers.
* Must be a positive number
* Defaults to 5.

For example if your search head has 24 physical cores you can safely set replicationThreads = 24. This will allow your search head to open 24 threads (1 to each indexer in this case) for bundle replication.

You can also use [replicationBlacklist] to reduce the size of the knowledge bundle. Since bin directories, jar and lookup files do not need to be replicated to search peers you could blacklist these in distsearch.conf.
on each Search Head:

$SPLUNK_HOME/etc/system/local/distsearch.conf
[replicationBlacklist]
noBinDir = (.../bin/*)
nojavabin = apps/splunk_archiver/java-bin/...

Note: if you are using the lookup command in a search, give it the option, 'local=true'. Automatic lookups however will not work if you are blacklisting lookups from the knowledge bundle

Ultra Champion

Also, a little bit of clarification, which may ease your mind: "took too long" makes this sound more like an error, but it isn't - it's just a warning. The replication went well.

Builder

Answers to these questions:

  • What are bundles?
    When initiating a distributed search, the search head replicates and distributes its knowledge objects to its search peers. Knowledge objects include saved searches, event types, and other entities used in searching across indexes. The search head needs to distribute this material to its search peers so that they can properly execute queries on its behalf. The set of data that the search head distributes is called the knowledge bundle.

    The indexers use the search head's knowledge bundle to execute queries on its behalf. When executing a distributed search, the indexers are ignorant of any local knowledge objects. They have access only to the objects in the search head's knowledge bundle.

    The process of distributing knowledge bundles means that indexers by default receive nearly the entire contents of all the search head's apps. If an app contains large binaries that do not need to be shared with the indexers, you can reduce the size of the bundle by means of the [replicationWhitelist] or [replicationBlacklist] stanza in distsearch.conf. See "Limit knowledge bundle size" in this manual.

    The knowledge bundle gets distributed to the $SPLUNK_HOME/var/run/searchpeers/ directory on each search peer. Because the search head distributes its knowledge, search scripts should not hardcode paths to resources. The knowledge bundle will reside at a different location on the search peer's file system, so hardcoded paths will not work properly.

    By default, the search head replicates and distributes the knowledge bundle to each search peer. For greater efficiency, you can instead tell the search peers to mount the knowledge bundle's directory location, eliminating the need for bundle replication. When you mount a knowledge bundle, it's referred to as a mounted bundle. To learn how to mount bundles, read "Mount the knowledge bundle".

What is bundle replication?

  • Essentially search heads send the required knowledge objects so that the indexers/peers can run the searches on its behalf. You can turn it off if you have mounted bundles.

What is the impact?

Bundle replication happens asynchronously from search. There is some impact on performance. However, long bundle replication times mean that the changes made in the search head will take longer to become effective, since they become effective only after the indexers have the bundles. (this is straight from one of the devs)

Mounting your knowledge bundles would help as detailed here http://docs.splunk.com/Documentation/Splunk/5.0/Deploy/Mounttheknowledgebundle

However, mounting to any type of share over WAN is going to yield very poor results and is not recommended by splunk.

And to clarify even further, a search will not be prevented from running just because knowledge replication has not finished. In the case that a search is launched from the search head mid bundle replication, the search head will tell the peers to use the last common bundle set, ie the last updated bundle they received. This prevents erroneous results and makes sure the peers use the same bundle for that search.

View solution in original post

Path Finder

Thank you for the clear response. Could you please also elaborate on what conditions must be met for bundle replication to occur?

0 Karma