I am receiving warnings around the length of time it takes to copy bundles from the Search head to the Indexing heads in a distributed environment. Before I go chasing "Mounted bundles", has anyone run into a tipping point on the size of the copied bundles being too large? Splunk....do you have a standard size for this?
Error: DistributedBundleReplicationManager - bundle replication to 4 peer(s) took too long (10197ms), bundle file size=930KB
From this message 10 seconds is too long to move a 930KB set of files.
The problem is not the just size( 930KB is pretty small). It is probably the network speed to copy this bundle to each of your search-peers, before even starting searching.
Please run some speed tests from the search-head to each of the search-peers, maybe 4 of them is especially slower.
Before going on the mounted bundle.
A workaround is to play with the TL of your bundles to update them less often.
see in limits.
* The minimum amount of time in seconds between two successive bundle replications.
* Defaults to 60
I increased the TL by 30% but I still get the error. Will inch it up again and see what happens.
Hello... I too am seeing this "replication took too long" message in our Production deployment, several times an hour. So I have read all that I can on replication, but I am still left with some questions that I'm hoping someone can answer 🙂 ...
1) My messages say my bundle is 520KB, which I gather is pretty small. So, can I / should I simply ignore these messages? How can I determine what impact these "too long replications" are having on my Splunk performance?
2) If I determine that the impact is negligible, is there a config change I can make (e.g., tooLong=30sec) that would increase the "too long" threshold and greatly reduce the frequency of these messages?
3) One of my 8 indexers has over five times as many replication files / space used than the other 7 (I'm looking in /…/var/run/searchpeers). What might cause this? Would this have a performance impact?
4) My understanding is that, when a distributed search is initiated, a replication will NOT occur UNLESS: (a) Something in my knowledge objects has changed since the last replication, AND (b) It has been at least replicationperiodsec since my last replication. Is my understanding correct?
Search head will do bundle replication to search after the replicationperiodsec. Search head will tar the bundle and then send the bundle checksum to search peer and it the check sum is different from what the search peer already had, replication will began. So, the 'repliction took too long' will depend on the size and also the speed of the connection.