Hi,
I have a distributed on Prem Splunk Enterprise Deployment at 8.1.x.
Splunk is running under Systemd.
I recently noticed that the previous admin did not tune up the ulimits.
I was wondering if anyone knew how to tune these settings based on individual host role/hw.
For instance if I override with "systemctl edit Splunkd.service" command
[Service]
LimitNOFILE=65535 < tech support suggestion
LimitNPROC=20480 < tech support suggestion
LimitDATA=(80% total RAM?)< tech support suggestion
LimitFSIZE=infinity < tech support suggestion
TasksMax=20480 <mirrored LimitNPROC per tech support
I have read the docs and seen the default suggestions for NOFILE and NPROC but how do you determine the other limits? Specifically NPROC and DATA?
Thank you!
Quoting https://www.freedesktop.org/software/systemd/man/systemd.exec.html
"Don't use. This limits the allowed address range, not memory use! Defaults to unlimited and should not be lowered. To limit memory use, see MemoryMax= in systemd.resource-control(5)"
BTW, I don't think I use any reasonable (read - less than my physical memory) MemoryMax setting. True, I'm hitting OOM-killers from time to time but I find it "safer" if splunk crashes and gets restarted by systemd than if it was to start failing silently due to inability to allocate memory. I must say though that I'm not 100% sure if it's not a "good" way to lose data in case of oom-killer due to overallocation with ingestion spikes (like huge files which are synchronized from remote source on a UF with no bandwidth limit).
ok thank you for the extra detail.
Currently all my hosts without any override.conf (ulimits settings) are displaying OoTB OS limits > 20k to100k as seen in the MC under healthcheck ulimits > ulimits.user_processes (current / recommended)
for idxs > 256956
for shs > 63437
that doesn't seem "cautious", and what do you recommend for LimitDATA?
TY
Quoting https://www.freedesktop.org/software/systemd/man/systemd.exec.html
"Don't use. This limits the allowed address range, not memory use! Defaults to unlimited and should not be lowered. To limit memory use, see MemoryMax= in systemd.resource-control(5)"
BTW, I don't think I use any reasonable (read - less than my physical memory) MemoryMax setting. True, I'm hitting OOM-killers from time to time but I find it "safer" if splunk crashes and gets restarted by systemd than if it was to start failing silently due to inability to allocate memory. I must say though that I'm not 100% sure if it's not a "good" way to lose data in case of oom-killer due to overallocation with ingestion spikes (like huge files which are synchronized from remote source on a UF with no bandwidth limit).
Ok ty I follow what you are saying and agree... but why doesn't splunk docs cover this better? 😀
Process/user limits are not easy topics (and I'm saying that as an admin with over 20 years of experience). And often the reasonable values are highly dependent on the use of limited software. So it's not easy to come up with something that is both easy to understand for beginners and does not get to be applied blindly in more complicated situations (like the infamous "rule" of making your swap twice your RAM amount).
As I wrote you on Slack - the limits are simply significantly raised compared to defaults and are more or less functionally equivalent to setting them to unlimited.
ulimits are used for limiting single user from eating up all resources. It makes sense in multiuser environment. With a server which only runs one "serious" task under one user limiting with ulimits don't make much sense. What you can achieve is that the process crashes "internally" due to not being able to allocate more memory instead of being killed by oom-killer but that's pretty much it.
Typical default ulimits in out the box OS are quite "cautiously" set so that's why you raise them.