On one of our servers, we recently faced issues with file forwarding. Upon checking in internal logs, we saw the below events -
10:34:49.688 09-27-2018 10:34:49.688 +0200 INFO ulimit - Limit: open files: 4096 files
host = ABCDSERVER01
source = /app/splunkforwarder/var/log/splunk/splunkd.log
sourcetype = splunkd
But upon checking on the server for Splunk UF with the user, it's running the ulimit value shown as below -
open files (-n) 100000
Not sure why there is a difference or am i missing something here?
If this system uses systemd (like RHEL7), you may be able to fix this by updating the Splunk init script as described at:
ulimits work a little differently for processes that are started with systemd but this change works around that by using su to start Splunk. My understanding is that su causes PAM (and therefore pam_limits) to be invoked, and the correct limits end up being set. I believe Splunk made the change to fix a potential security vulnerability described at https://www.splunk.com/view/SP-CAAAP3M but it has the convenient side effect of fixing the limits without making changes to the systemd config. There are other systemd-specific ways of solving this problem but this is a simple fix if it works for you.
You may wish to refer to Why are my ulimits settings not being respected on ubuntu/debian after a server reboot?
At the time Splunk started the ulimit was 4096, it is possible that (for example) using /etc/security/limits.conf that the ulimit changes occur after Splunk startup.
The solution is to login as the Splunk user on the CLI and restart Splunk, the reason to do this is if the Splunk process is restarted by a deployment server or similar it will continue to have the 4096 ulimit (as it will fork itself which inherits the current limits). Alternatively a reboot with the above mentioned article fix will also resolve the issue.