I have an indexer on linux on a physical server, with 100+ forwarders, and local files indexing, it's also my deployment server, and my search-head, with all the users from my AD.
I beefed-up the disk size, multiple cores, RAM... But I still see some complains at startup. How can I tune it ?
here is my starting log in splunkd.log
03-03-2011 21:50:09.027 INFO ulimit - Limit: virtual address space size: unlimited 03-03-2011 21:50:09.027 INFO ulimit - Limit: data segment size: 1879048192 bytes [hard maximum: unlimited] 03-03-2011 21:50:09.027 INFO ulimit - Limit: resident memory size: 2147482624 bytes [hard maximum: unlimited] 03-03-2011 21:50:09.027 INFO ulimit - Limit: stack size: 33554432 bytes [hard maximum: 2147483646 bytes] 03-03-2011 21:50:09.027 INFO ulimit - Limit: core file size: 1073741312 bytes [hard maximum: unlimited] 03-03-2011 21:50:09.027 INFO ulimit - Limit: data file size: 2147483646 bytes 03-03-2011 21:50:09.027 ERROR ulimit - Splunk may not work due to low file size limit 03-03-2011 21:50:09.027 INFO ulimit - Limit: open files: 1024 03-03-2011 21:50:09.027 INFO ulimit - Limit: cpu time: unlimited 03-03-2011 21:50:09.029 INFO loader - Splunkd starting (build 95063).
The culprit is your server ulimit. This is a classic problem with linux boxes, a real dedicated server needs higher limits.
To check your limits,
ulimit -a # if you are running splunk under another user su myuserrunningsplunk ulimit -a # or restart splunk and check grep ulimit $SPLUNK_HOME/var/log/splunk/splunkd.log
the 2 critical values are :
the file size (ulimit -f), because the size of a uncompressed bucket files can be very high. This is why splunk was complaining at launch or you.
the number of open files (ulimit -n), also named number of file descriptors. This one is very important in your case, because splunk is consuming a lot or file descriptors. Increase the value to at least 2048 or more (depending of your server capacity, I usually multiply by 10 to 10240 or 102400 or even unlimited for dedicated high-end production servers)
the number of user processes (ulimit -u) this is one if linked to the number of users and concurrent searches, it it is recommended to have more than 1024, 10000 is a good start
To have an estimation, consider that you may concurrently need a file descriptors for: every forwarder socket, every deployment client socket, each bucket can use 10 to 100 files, every search will consume up to 3, every file to be indexed, every user connected ...
How to change the file descriptor ulimit, by example to 10240 ?
splunkuser hard nofile 20240 splunkuser soft nofile 10240
Don't forget to restart splunk after, and double check for ulimit in splunkd.log to confirm that the new value is detected (in particular if splunk is not running as root).
Remarks :
Re-load the new setting either by restarting the box or else issue command
/sbin/sysctl -p
This one is quite old - but still wrong - therefore the downvote.
sysctl -p
-command load kernel settings from /etc/sysctl.conf
. That's different from the user-specific settings in /etc/security/limits.conf
, which are not read by this command. Instead one has to logoff and logon again.
Also remember for those running splunk as root, that root has to be explicitly listed in the /etc/security/limits.conf and will not match against the '*' wildcard. eg.
root soft nofile 10000
root hard nofile 20000
At least today - e.g. in a Red Hat 7.6 environment - wildcard settings are used for the root user also.
Note about Ubuntu. On my 8.04 install, the "pam_limits" module wasn't enabled by default for the "su" command, and therefore the "splunk" init.d (startup/shutdown) script which prevented the appropriate limits from being applied. Here's what I had to do:
First, edit /etc/pam.d/su
: You will need to add (or uncomment) the following line:
session required pam_limits.so
Second, edit /etc/security/limits.conf
: Add the following "nofiles" limit for the "splunk" user (or whatever user your splunkd
process runs as)
splunk soft nofile 4096
splunk hard nofile 8196
Note, you may also want to enable "pam_limits.so" for "sudo" as well, if you ever use that tool too login interactively restart splunk services.
Update:
This appears to still be true for default Ubuntu 12.04 installs.
Amazing detective work.
The culprit is your server ulimit. This is a classic problem with linux boxes, a real dedicated server needs higher limits.
To check your limits,
ulimit -a # if you are running splunk under another user su myuserrunningsplunk ulimit -a # or restart splunk and check grep ulimit $SPLUNK_HOME/var/log/splunk/splunkd.log
the 2 critical values are :
the file size (ulimit -f), because the size of a uncompressed bucket files can be very high. This is why splunk was complaining at launch or you.
the number of open files (ulimit -n), also named number of file descriptors. This one is very important in your case, because splunk is consuming a lot or file descriptors. Increase the value to at least 2048 or more (depending of your server capacity, I usually multiply by 10 to 10240 or 102400 or even unlimited for dedicated high-end production servers)
the number of user processes (ulimit -u) this is one if linked to the number of users and concurrent searches, it it is recommended to have more than 1024, 10000 is a good start
To have an estimation, consider that you may concurrently need a file descriptors for: every forwarder socket, every deployment client socket, each bucket can use 10 to 100 files, every search will consume up to 3, every file to be indexed, every user connected ...
How to change the file descriptor ulimit, by example to 10240 ?
splunkuser hard nofile 20240 splunkuser soft nofile 10240
Don't forget to restart splunk after, and double check for ulimit in splunkd.log to confirm that the new value is detected (in particular if splunk is not running as root).
Remarks :
I do not have the limits.conf file at the location /etc/security/limits.conf, how should I proceed?
When I run the command,
splunkuser hard nofile 20240
It says, its not a valid command.
Sorry, if this is an invalid question. I am new to this. Please help.
/!\ please read carefuly /!\
temporary with a simple ulimit -n 10240 (this will not survive a server restart)
So use ulimit -n 20240 ! 😉
In my experience with AWS deployments, I found the need to change the settings within /etc/security/limits.d
if I changed only the values in the limits.conf from /etc/security/ it gets overridden by limits.d contents.
make sure to check if any pre- determined limits were not set by AWS in limits.d
Hope it helps,
and the official troubleshooting method
http://docs.splunk.com/Documentation/Splunk/latest/Troubleshooting/ulimitErrors
here is the recommendation
http://docs.splunk.com/Documentation/Splunk/latest/Installation/Systemrequirements#Considerations_re...
persistently edit users limits (on AIX and ubuntu) in /etc/security/limits.conf
( Hard limits are maintained by the kernel while the soft limits are enforced by the shell, here we use the soft < hard)
splunkuser hard nofile 20240
splunkuser soft nofile 10240
is not working after server reboot
deployment server has no link with this particular file descriptor issue.
ulimit is a system setting not a splunk setting.
Any tips on tuning the deployment server settings to reduce impact to performance in scenarios where an independent deployment server is not possible? I am asking specifically in this context as I am already familiar with settings for deployment server. I'm specifically curious about settings that will reduce issues that we get related to ulimit settings. Thanks for any tips!
remark : the answer was edited, we mentioned previously to edit /etc/sysctl.conf instead of /etc/security/limit.conf. You should not touch sysctl that has a higher default value already.
This is much more stable with openfile= 8096 for me.
I also increased the filesize, on ext3 64bit I can go up to 2TB, 32bit seems limited to 2GB.
http://www.cyberciti.biz/tips/what-is-maximum-partition-size-supported-by-linux.html