Deployment Architecture

How to tune ulimits across the deployment?

Glasses2
Communicator

Hi,

I have a distributed on Prem Splunk Enterprise Deployment at 8.1.x.

Splunk is running under Systemd.

I recently noticed that the previous admin did not tune up the ulimits.

I was wondering if anyone knew how to tune these settings based on individual host role/hw.

For instance if I override with "systemctl edit Splunkd.service" command

[Service]

LimitNOFILE=65535 < tech support suggestion
LimitNPROC=20480 < tech support suggestion
LimitDATA=(80% total RAM?)< tech support suggestion
LimitFSIZE=infinity < tech support suggestion
TasksMax=20480 <mirrored LimitNPROC per tech support

I have read the docs and seen the default suggestions for NOFILE and NPROC but how do you determine the other limits? Specifically NPROC and DATA?

ref: >>> https://docs.splunk.com/Documentation/Splunk/9.0.2/Installation/Systemrequirements#Considerations_re...

Thank you!

0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

Quoting https://www.freedesktop.org/software/systemd/man/systemd.exec.html

"Don't use. This limits the allowed address range, not memory use! Defaults to unlimited and should not be lowered. To limit memory use, see MemoryMax= in systemd.resource-control(5)"

BTW, I don't think I use any reasonable (read - less than my physical memory) MemoryMax setting. True, I'm hitting OOM-killers from time to time but I find it "safer" if splunk crashes and gets restarted by systemd than if it was to start failing silently due to inability to allocate memory. I must say though that I'm not 100% sure if it's not a "good" way to lose data in case of oom-killer due to overallocation with ingestion spikes (like huge files which are synchronized from remote source on a UF with no bandwidth limit).

View solution in original post

Glasses2
Communicator

ok thank you for the extra detail.

Currently all my hosts without any override.conf (ulimits settings) are displaying OoTB OS limits > 20k to100k as seen in the MC under healthcheck ulimits > ulimits.user_processes (current / recommended)

for idxs > 256956

for shs > 63437

that doesn't seem "cautious", and what do you recommend for LimitDATA?

TY

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Quoting https://www.freedesktop.org/software/systemd/man/systemd.exec.html

"Don't use. This limits the allowed address range, not memory use! Defaults to unlimited and should not be lowered. To limit memory use, see MemoryMax= in systemd.resource-control(5)"

BTW, I don't think I use any reasonable (read - less than my physical memory) MemoryMax setting. True, I'm hitting OOM-killers from time to time but I find it "safer" if splunk crashes and gets restarted by systemd than if it was to start failing silently due to inability to allocate memory. I must say though that I'm not 100% sure if it's not a "good" way to lose data in case of oom-killer due to overallocation with ingestion spikes (like huge files which are synchronized from remote source on a UF with no bandwidth limit).

Glasses2
Communicator

Ok ty I follow what you are saying and agree... but why doesn't splunk docs cover this better? 😀

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Process/user limits are not easy topics (and I'm saying that as an admin with over 20 years of experience). And often the reasonable values are highly dependent on the use of limited software. So it's not easy to come up with something that is both easy to understand for beginners and does not get to be applied blindly in more complicated situations (like the infamous "rule" of making your swap twice your RAM amount).

0 Karma

PickleRick
SplunkTrust
SplunkTrust

As I wrote you on Slack - the limits are simply significantly raised compared to defaults and are more or less functionally equivalent to setting them to unlimited.

ulimits are used for limiting single user from eating up all resources. It makes sense in multiuser environment. With a server which only runs one "serious" task under one user limiting with ulimits don't make much sense. What you can achieve is that the process crashes "internally" due to not being able to allocate more memory instead of being killed by oom-killer but that's pretty much it.

Typical default ulimits in out the box OS are quite "cautiously" set so that's why you raise them.

Get Updates on the Splunk Community!

Thanks for the Memories! Splunk University, .conf24, and Community Connections

Thank you to everyone in the Splunk Community who joined us for .conf24 – starting with Splunk University and ...

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

 (view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...