Solved: How to tune ulimits across the deployment?

Glasses2 · ‎12-01-2022

Hi,

I have a distributed on Prem Splunk Enterprise Deployment at 8.1.x.

Splunk is running under Systemd.

I recently noticed that the previous admin did not tune up the ulimits.

I was wondering if anyone knew how to tune these settings based on individual host role/hw.

For instance if I override with "systemctl edit Splunkd.service" command

[Service]

LimitNOFILE=65535 < tech support suggestion
LimitNPROC=20480 < tech support suggestion
LimitDATA=(80% total RAM?)< tech support suggestion
LimitFSIZE=infinity < tech support suggestion
TasksMax=20480 <mirrored LimitNPROC per tech support

I have read the docs and seen the default suggestions for NOFILE and NPROC but how do you determine the other limits? Specifically NPROC and DATA?

ref: >>> https://docs.splunk.com/Documentation/Splunk/9.0.2/Installation/Systemrequirements#Considerations_re...

Thank you!

PickleRick · ‎12-01-2022

Quoting https://www.freedesktop.org/software/systemd/man/systemd.exec.html

"Don't use. This limits the allowed address range, not memory use! Defaults to unlimited and should not be lowered. To limit memory use, see MemoryMax= in systemd.resource-control(5)"

BTW, I don't think I use any reasonable (read - less than my physical memory) MemoryMax setting. True, I'm hitting OOM-killers from time to time but I find it "safer" if splunk crashes and gets restarted by systemd than if it was to start failing silently due to inability to allocate memory. I must say though that I'm not 100% sure if it's not a "good" way to lose data in case of oom-killer due to overallocation with ingestion spikes (like huge files which are synchronized from remote source on a UF with no bandwidth limit).

View solution in original post

Glasses2 · ‎12-01-2022

ok thank you for the extra detail.

Currently all my hosts without any override.conf (ulimits settings) are displaying OoTB OS limits > 20k to100k as seen in the MC under healthcheck ulimits > ulimits.user_processes (current / recommended)

for idxs > 256956

for shs > 63437

that doesn't seem "cautious", and what do you recommend for LimitDATA?

TY

PickleRick · ‎12-01-2022

Quoting https://www.freedesktop.org/software/systemd/man/systemd.exec.html

"Don't use. This limits the allowed address range, not memory use! Defaults to unlimited and should not be lowered. To limit memory use, see MemoryMax= in systemd.resource-control(5)"

BTW, I don't think I use any reasonable (read - less than my physical memory) MemoryMax setting. True, I'm hitting OOM-killers from time to time but I find it "safer" if splunk crashes and gets restarted by systemd than if it was to start failing silently due to inability to allocate memory. I must say though that I'm not 100% sure if it's not a "good" way to lose data in case of oom-killer due to overallocation with ingestion spikes (like huge files which are synchronized from remote source on a UF with no bandwidth limit).

Glasses2 · ‎12-01-2022

Ok ty I follow what you are saying and agree... but why doesn't splunk docs cover this better? 😀

PickleRick · ‎12-02-2022

Process/user limits are not easy topics (and I'm saying that as an admin with over 20 years of experience). And often the reasonable values are highly dependent on the use of limited software. So it's not easy to come up with something that is both easy to understand for beginners and does not get to be applied blindly in more complicated situations (like the infamous "rule" of making your swap twice your RAM amount).

PickleRick · ‎12-01-2022

As I wrote you on Slack - the limits are simply significantly raised compared to defaults and are more or less functionally equivalent to setting them to unlimited.

ulimits are used for limiting single user from eating up all resources. It makes sense in multiuser environment. With a server which only runs one "serious" task under one user limiting with ulimits don't make much sense. What you can achieve is that the process crashes "internally" due to not being able to allocate more memory instead of being killed by oom-killer but that's pretty much it.

Typical default ulimits in out the box OS are quite "cautiously" set so that's why you raise them.

How to tune ulimits across the deployment?

deployer

deployment server

distributed search

indexer clustering

search head clustering

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life