Hi,
We had a problem today where our filesystem filled up on indexers, caused by many bundles appearing suddenly. I'm not overly familiar with this functionality. How/when do bundles get sent to the indexers? Scheduled? Every search? I see many from the same search-heads, so how does it manage them? Inquiring minds want to know...
If your file system is filling up from bundles, you were likely seeing an extremely large $SPLUNK_HOME/var/run/searchpeers. A search head replicates and distributes its knowledge objects to its search peers in the bundles you see in var/run/searchpeers.
Knowledge objects include saved searches, event types, and other entities used in searching across indexes. The search head needs to distribute this material to its search peers so that they can properly execute queries on its behalf. Bundles typically contain a subset of files (configuration files and assets) from $SPLUNK_HOME/etc/system, $SPLUNK_HOME/etc/apps and $SPLUNK_HOME/etc/users. The process of distributing knowledge bundles means that peers by default receive nearly the entire contents of the search head's apps. If an app contains large binaries that do not need to be shared with the peers, that could also be a reason for the large bundle sizes.
You can read more specifically on those bundles here:
http://docs.splunk.com/Documentation/Splunk/6.5.0/DistSearch/Whatsearchheadssend
The best way to mitigate this issue is to reduce the bundle size on the search head itself. This is done with the replication blacklist (just deleting the bundles will only temporarily resolve disk usage problems, as they will get replicated again if they still exist on the SH). The blacklist allows you to limit what is sent to the search peers (indexers) in the knowledge bundle.
We have an entire documentation page on that here:
http://docs.splunk.com/Documentation/Splunk/6.5.0/DistSearch/Limittheknowledgebundlesize
As mentioned earlier, most bin directories, jar and lookup files do not need to be replicated to search peers, and can be blacklisted in distsearch.conf. For example, on the search heads:
$SPLUNK_HOME/etc/system/local/distsearch.conf
[replicationBlacklist]
noBinDir = .../bin/*
jarAndLookups = (jar|lookups)
You can then stop Splunk on each indexer (one at a time) and remove the knowledge bundles in $SPLUNK_HOME/var/run/searchpeers and then start Splunk (the entire contents of $SPLUNK_HOME/var/run/searchpeers can be deleted). The search heads will redistribute the new (reduced size) knowledge bundles.
As an FYI, each indexer keeps 5 knowledge bundles per search head.
If your file system is filling up from bundles, you were likely seeing an extremely large $SPLUNK_HOME/var/run/searchpeers. A search head replicates and distributes its knowledge objects to its search peers in the bundles you see in var/run/searchpeers.
Knowledge objects include saved searches, event types, and other entities used in searching across indexes. The search head needs to distribute this material to its search peers so that they can properly execute queries on its behalf. Bundles typically contain a subset of files (configuration files and assets) from $SPLUNK_HOME/etc/system, $SPLUNK_HOME/etc/apps and $SPLUNK_HOME/etc/users. The process of distributing knowledge bundles means that peers by default receive nearly the entire contents of the search head's apps. If an app contains large binaries that do not need to be shared with the peers, that could also be a reason for the large bundle sizes.
You can read more specifically on those bundles here:
http://docs.splunk.com/Documentation/Splunk/6.5.0/DistSearch/Whatsearchheadssend
The best way to mitigate this issue is to reduce the bundle size on the search head itself. This is done with the replication blacklist (just deleting the bundles will only temporarily resolve disk usage problems, as they will get replicated again if they still exist on the SH). The blacklist allows you to limit what is sent to the search peers (indexers) in the knowledge bundle.
We have an entire documentation page on that here:
http://docs.splunk.com/Documentation/Splunk/6.5.0/DistSearch/Limittheknowledgebundlesize
As mentioned earlier, most bin directories, jar and lookup files do not need to be replicated to search peers, and can be blacklisted in distsearch.conf. For example, on the search heads:
$SPLUNK_HOME/etc/system/local/distsearch.conf
[replicationBlacklist]
noBinDir = .../bin/*
jarAndLookups = (jar|lookups)
You can then stop Splunk on each indexer (one at a time) and remove the knowledge bundles in $SPLUNK_HOME/var/run/searchpeers and then start Splunk (the entire contents of $SPLUNK_HOME/var/run/searchpeers can be deleted). The search heads will redistribute the new (reduced size) knowledge bundles.
As an FYI, each indexer keeps 5 knowledge bundles per search head.
bo-bo-bo-bo-bo-bo
Love this answer...
I assume this is all still relevant as I have had the problem in 7.1.1. Just wanted to note that "noBinDir = .../bin/* " only took care of the bin files in $SPLUNK_HOME/etc/system for me, but if I changed it to ../bin/... it then worked anywhere. This is similar for the jar and lookups example he gave as I didn't get that to work. I am using "noBinOrJars = .../(bin|jars)/..." which works nicely. Another option for lookups is under the stanza [replicationSettings] using the setting "excludeReplicatedLookupSize = 1" as I was missing some lookups that cause search warnings because it could not find the lookup. If you do turn off the lookups folder entirely, you can turn off those errors for lookups by going to the individual transforms.conf stanza for the lookup and add "replicate=false". I also notice there is a "maxBundleSize" option (that defaults at 2048--2GB) if you have to increase it (maybe just temporarily while you work around the problem) but probably not a great idea to use.
@a212830, good answer? or best answer ever? lol. You good here to "accept" this one?
OOOOH LOOOONG JOOOOOHNSON
This should give you details on the knowledge bundle which Search Head send to search peers/indexers
http://docs.splunk.com/Documentation/Splunk/6.5.0/DistSearch/Whatsearchheadssend
So, the search-head sends down the delta for every search? I saw continuous large bundles (not just deltas). Does Splunk manage these in any way? Look out for old files?
In my understanding, whether to send a knowledge bundle, full OR delta, depends upon what search is being executed, what knowledge objects are being used by it and if any other knowledge objects used have been updated since the last bundle replication. If you're seeing the large bundles size, I would bet large lookups would be the major contributor to that. You can copy and untar the latest bundle and see what big files are there causing large size. Look at this for details on filters that you apply to limit the bundle size.
http://docs.splunk.com/Documentation/Splunk/6.5.0/DistSearch/Limittheknowledgebundlesize
Thanks. When/how do the bundles get deleted?
The searchpeers directory retains up to five replicated bundles from each search head sending requests. If you delete them, they will be created again for the next search that needs that set of configurations.
Thanks. I'd still like to know how they get there. Do the search-heads send them down every time a search is run? Are they scheduled? Sorry for the level of detail, but I'm going to be asked these questions as a result of the outage.
Both Search Head and Search Peer maintain a checksum of the configuration available on Search Head and knowledge bundle previously sent to search peers (indexers). If the checksum is outdated, Search head will send updated bundle, whole OR delta, to search peer running the search. IMO, the check happens when the search peer is added OR a search is distributed on a search peer.