currently half of my searchheads are shutdown (auto shutdown due to issues within Splunk) and the remaining are not able to query the indexers
The problem is caused by a large knowledge bundle.
when i checked the .bundle files on the SHs, it is a huge (~340 MB) file with what looks like a huge python code.
i have maxBundleSize set to 2048(which is the default)
i have a blacklist in distsearch.conf which is as below:
[replicationSettings]
maxBundleSize = 2048
[replicationBlacklist]
<name for bin directories> = (.../bin/*)
<name for InstallDirectories> = (.../install/*)
<name for AppServerDirectories> = (.../appserver/*)
<name for allAppUIDirectories> = (.../default/data/ui/*)
<name for allOldDefaultDirectories> = (.../default.old.*)
My questions is: is there any way to check what files/apps are included in this bundle that is causing issues and if those items are required or can be excluded.
mkdir -p /tmp/support
tar xvf /opt/splunk/var/run/blah.bundle -C /tmp/support
cd /tmp/support
du -h --max-depth=1 |sort -hr |more
Walk it out most likley be in apps/*/lookups
You can blacklist any lookup that is not:
And if you blacklist it..... and you get an error after the fact. Unblacklist it.
Don't forget that you have a Transmit side and a Receive side
In your case the transmit side is your SH and the distsearch.conf setting maxBundleSize applies
however
the receive side is your indexers... and that setting is server.conf
[httpServer]
max_content_length = blah
And depending on your version it might be 800mb or 2gb but written as 2147483648 in the 2gb example.
Hope this helps.
mkdir -p /tmp/support
tar xvf /opt/splunk/var/run/blah.bundle -C /tmp/support
cd /tmp/support
du -h --max-depth=1 |sort -hr |more
Walk it out most likley be in apps/*/lookups
You can blacklist any lookup that is not:
And if you blacklist it..... and you get an error after the fact. Unblacklist it.
Don't forget that you have a Transmit side and a Receive side
In your case the transmit side is your SH and the distsearch.conf setting maxBundleSize applies
however
the receive side is your indexers... and that setting is server.conf
[httpServer]
max_content_length = blah
And depending on your version it might be 800mb or 2gb but written as 2147483648 in the 2gb example.
Hope this helps.
when i ran the commands as suggested by you, i got the below results, i was of the view that
2.6G ./apps
2.6G .
328K ./system
56K ./users
48K ./kvstore_s_SA-
is it safe to blacklist the apps directory entirely, we have a huge dependency on the TA and app for AWS. on further troubleshooting i found that the lookup aws_description.csv is taking up close to 2.3 GB. is it safe to blacklist the aws_description.csv lookup, since we would require aws description data for alerts and reports.
In case i need to blacklist, will the below setting work
[replicationBlacklist]
<name for lookup directories> = (.../lookups/...)
<name for bin and jardirectories> = (.../(bin|jars)/...)
I deleted my last post because I missed your part about the aws_description.csv being 2.3 GB.
As I mentioned earlier..... You need to find out if that file is being used as part of an automatic lookup in a props statement. If it is not blacklist the file. If you get errors after the fact un-blacklist it.
And figure out why that csv is so big. You might want to file a support case and work with an AWS SME.
Bottom line is if the lookup is being performed on the SH you don't need the CSV in the bundle.
If you find you do need it, then you need to increase your maxBundleSize and max_content_length, but I would suspect something is wrong if that file is 2.3 gb
Thanks a lot for your response.