Deployment Architecture

search heads failing because of huge knowledge bundles

soumyasaha25
Contributor

currently half of my searchheads are shutdown (auto shutdown due to issues within Splunk) and the remaining are not able to query the indexers
The problem is caused by a large knowledge bundle.
when i checked the .bundle files on the SHs, it is a huge (~340 MB) file with what looks like a huge python code.
i have maxBundleSize set to 2048(which is the default)
i have a blacklist in distsearch.conf which is as below:

[replicationSettings]
maxBundleSize = 2048

[replicationBlacklist]
<name for bin directories> = (.../bin/*)
<name for InstallDirectories> = (.../install/*)
<name for AppServerDirectories> = (.../appserver/*)
<name for allAppUIDirectories> = (.../default/data/ui/*)
<name for allOldDefaultDirectories> = (.../default.old.*)

My questions is: is there any way to check what files/apps are included in this bundle that is causing issues and if those items are required or can be excluded.

0 Karma
1 Solution

jwelch_splunk
Splunk Employee
Splunk Employee

mkdir -p /tmp/support
tar xvf /opt/splunk/var/run/blah.bundle -C /tmp/support
cd /tmp/support
du -h --max-depth=1 |sort -hr |more

Walk it out most likley be in apps/*/lookups

You can blacklist any lookup that is not:

  1. Automatic Lookup "props.conf"
  2. any csv that is not being searched with a |lookup . because that is a remote lookup...If you are using | lookup local=true you could blacklist it.

And if you blacklist it..... and you get an error after the fact. Unblacklist it.

Don't forget that you have a Transmit side and a Receive side

In your case the transmit side is your SH and the distsearch.conf setting maxBundleSize applies
however
the receive side is your indexers... and that setting is server.conf
[httpServer]
max_content_length = blah

And depending on your version it might be 800mb or 2gb but written as 2147483648 in the 2gb example.

Hope this helps.

View solution in original post

0 Karma

jwelch_splunk
Splunk Employee
Splunk Employee

mkdir -p /tmp/support
tar xvf /opt/splunk/var/run/blah.bundle -C /tmp/support
cd /tmp/support
du -h --max-depth=1 |sort -hr |more

Walk it out most likley be in apps/*/lookups

You can blacklist any lookup that is not:

  1. Automatic Lookup "props.conf"
  2. any csv that is not being searched with a |lookup . because that is a remote lookup...If you are using | lookup local=true you could blacklist it.

And if you blacklist it..... and you get an error after the fact. Unblacklist it.

Don't forget that you have a Transmit side and a Receive side

In your case the transmit side is your SH and the distsearch.conf setting maxBundleSize applies
however
the receive side is your indexers... and that setting is server.conf
[httpServer]
max_content_length = blah

And depending on your version it might be 800mb or 2gb but written as 2147483648 in the 2gb example.

Hope this helps.

0 Karma

soumyasaha25
Contributor

when i ran the commands as suggested by you, i got the below results, i was of the view that
2.6G ./apps
2.6G .
328K ./system
56K ./users
48K ./kvstore_s_SA-
is it safe to blacklist the apps directory entirely, we have a huge dependency on the TA and app for AWS. on further troubleshooting i found that the lookup aws_description.csv is taking up close to 2.3 GB. is it safe to blacklist the aws_description.csv lookup, since we would require aws description data for alerts and reports.
In case i need to blacklist, will the below setting work

[replicationBlacklist]
<name for lookup directories> = (.../lookups/...)
<name for bin and jardirectories> = (.../(bin|jars)/...)
0 Karma

jwelch_splunk
Splunk Employee
Splunk Employee

I deleted my last post because I missed your part about the aws_description.csv being 2.3 GB.

As I mentioned earlier..... You need to find out if that file is being used as part of an automatic lookup in a props statement. If it is not blacklist the file. If you get errors after the fact un-blacklist it.

And figure out why that csv is so big. You might want to file a support case and work with an AWS SME.

Bottom line is if the lookup is being performed on the SH you don't need the CSV in the bundle.

If you find you do need it, then you need to increase your maxBundleSize and max_content_length, but I would suspect something is wrong if that file is 2.3 gb

0 Karma

soumyasaha25
Contributor

Thanks a lot for your response.

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...