We recently upgraded to Splunk 6.5.1 and noticed a fairly large increase in our replicated knowledge bundle size from the Search head to our search peers. After doing some digging it appears that the splunk_archiver app has grown significantly in size since the last version we were on, 6.3.5.
In Splunk 6.3.5 the splunk_archiver app was a total of 108K in the knowledge bundle tar. In Splunk 6.5.1 the splunk_archiver app is a total of 73M in size; mostly due to some large jar files.
Since the splunk_archiver app is a native Splunk component and already lives on the Splunk Indexers, does the search head really need to be distributing the entire splunk_archiver app to the search peers? This is an additional 73M of data that doesn't seem to be necessary to distribute.
Would it be safe to assume that we can blacklist the splunk_archiver app from distribution in the knowledge bundle without breaking anything?
I had the same question and asked our support view on this. It was suggested that we can safely blacklist them. I just did blacklist of the jar files inside the app and haven't seen any issues in our production.
Blacklisting the entire splunk_archiver app from bundle replication doesn't appear to have any repercussion on my production search head. We don't archive data to Hadoop, perhaps this app would need to be distributed to maintain that functionality.
You can blacklist the app with this distsearch.conf stanza:
[replicationBlacklist]
label_here = apps/splunk_archiver/...
Doing this cuts the bundle size down by about 70MB to the indexers. Seeing as the splunk_archiver app is part of every splunk install, not sure why Splunk really needs to bundle this up by default and send it down to all the indexers - especially because in 90+% cases most folks are not using hadoop archiving.
I had the same question and asked our support view on this. It was suggested that we can safely blacklist them. I just did blacklist of the jar files inside the app and haven't seen any issues in our production.
Thanks for the response. I'm going to blacklist the entire splunk_archiver app on one of my dev search heads to see if it has any negative impact. I can't imagine that it will considering the entire function of the archiver app should solely be used on the servers that are indexing data.
https://docs.splunk.com/Documentation/Hunk/6.4.5/Hunk/ArchivingSplunkindexes - This document has few details about what the app functionality is.
So I just found something interesting with btool.
/opt/splunk/etc/apps/splunk_archiver/default/distsearch.conf [replicationWhitelist]
/opt/splunk/etc/apps/splunk_archiver/default/distsearch.conf javabin = apps/splunk_archiver/java-bin/...
It looks like the Splunk devs purposefully want the java-bin .jar files to be replicated.....though I have no idea why.
The documentation talks about archiving data into Hadoop. May be it needs them if you are using that functionality. We don't and did not see any impact by blacklisting them.