Deployment Architecture

How to tune ulimits across the deployment?

Glasses2
Communicator

Hi,

I have a distributed on Prem Splunk Enterprise Deployment at 8.1.x.

Splunk is running under Systemd.

I recently noticed that the previous admin did not tune up the ulimits.

I was wondering if anyone knew how to tune these settings based on individual host role/hw.

For instance if I override with "systemctl edit Splunkd.service" command

[Service]

LimitNOFILE=65535 < tech support suggestion
LimitNPROC=20480 < tech support suggestion
LimitDATA=(80% total RAM?)< tech support suggestion
LimitFSIZE=infinity < tech support suggestion
TasksMax=20480 <mirrored LimitNPROC per tech support

I have read the docs and seen the default suggestions for NOFILE and NPROC but how do you determine the other limits? Specifically NPROC and DATA?

ref: >>> https://docs.splunk.com/Documentation/Splunk/9.0.2/Installation/Systemrequirements#Considerations_re...

Thank you!

0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

Quoting https://www.freedesktop.org/software/systemd/man/systemd.exec.html

"Don't use. This limits the allowed address range, not memory use! Defaults to unlimited and should not be lowered. To limit memory use, see MemoryMax= in systemd.resource-control(5)"

BTW, I don't think I use any reasonable (read - less than my physical memory) MemoryMax setting. True, I'm hitting OOM-killers from time to time but I find it "safer" if splunk crashes and gets restarted by systemd than if it was to start failing silently due to inability to allocate memory. I must say though that I'm not 100% sure if it's not a "good" way to lose data in case of oom-killer due to overallocation with ingestion spikes (like huge files which are synchronized from remote source on a UF with no bandwidth limit).

View solution in original post

Glasses2
Communicator

ok thank you for the extra detail.

Currently all my hosts without any override.conf (ulimits settings) are displaying OoTB OS limits > 20k to100k as seen in the MC under healthcheck ulimits > ulimits.user_processes (current / recommended)

for idxs > 256956

for shs > 63437

that doesn't seem "cautious", and what do you recommend for LimitDATA?

TY

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Quoting https://www.freedesktop.org/software/systemd/man/systemd.exec.html

"Don't use. This limits the allowed address range, not memory use! Defaults to unlimited and should not be lowered. To limit memory use, see MemoryMax= in systemd.resource-control(5)"

BTW, I don't think I use any reasonable (read - less than my physical memory) MemoryMax setting. True, I'm hitting OOM-killers from time to time but I find it "safer" if splunk crashes and gets restarted by systemd than if it was to start failing silently due to inability to allocate memory. I must say though that I'm not 100% sure if it's not a "good" way to lose data in case of oom-killer due to overallocation with ingestion spikes (like huge files which are synchronized from remote source on a UF with no bandwidth limit).

Glasses2
Communicator

Ok ty I follow what you are saying and agree... but why doesn't splunk docs cover this better? 😀

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Process/user limits are not easy topics (and I'm saying that as an admin with over 20 years of experience). And often the reasonable values are highly dependent on the use of limited software. So it's not easy to come up with something that is both easy to understand for beginners and does not get to be applied blindly in more complicated situations (like the infamous "rule" of making your swap twice your RAM amount).

0 Karma

PickleRick
SplunkTrust
SplunkTrust

As I wrote you on Slack - the limits are simply significantly raised compared to defaults and are more or less functionally equivalent to setting them to unlimited.

ulimits are used for limiting single user from eating up all resources. It makes sense in multiuser environment. With a server which only runs one "serious" task under one user limiting with ulimits don't make much sense. What you can achieve is that the process crashes "internally" due to not being able to allocate more memory instead of being killed by oom-killer but that's pretty much it.

Typical default ulimits in out the box OS are quite "cautiously" set so that's why you raise them.

Get Updates on the Splunk Community!

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

Get Inspired! We’ve Got Validation that Your Hard Work is Paying Off

We love our Splunk Community and want you to feel inspired by all your hard work! Eric Fusilero, our VP of ...

What's New in Splunk Enterprise 9.4: Features to Power Your Digital Resilience

Hey Splunky People! We are excited to share the latest updates in Splunk Enterprise 9.4. In this release we ...