Monitoring Splunk

how to tune ulimit on my server ?

Communicator

I have an indexer on linux on a physical server, with 100+ forwarders, and local files indexing, it's also my deployment server, and my search-head, with all the users from my AD.

I beefed-up the disk size, multiple cores, RAM... But I still see some complains at startup. How can I tune it ?

here is my starting log in splunkd.log

03-03-2011 21:50:09.027 INFO  ulimit - Limit: virtual address space size: unlimited
03-03-2011 21:50:09.027 INFO  ulimit - Limit: data segment size: 1879048192 bytes [hard maximum: unlimited]
03-03-2011 21:50:09.027 INFO  ulimit - Limit: resident memory size: 2147482624 bytes [hard maximum: unlimited]
03-03-2011 21:50:09.027 INFO  ulimit - Limit: stack size: 33554432 bytes [hard maximum: 2147483646 bytes]
03-03-2011 21:50:09.027 INFO  ulimit - Limit: core file size: 1073741312 bytes [hard maximum: unlimited]
03-03-2011 21:50:09.027 INFO  ulimit - Limit: data file size: 2147483646 bytes
03-03-2011 21:50:09.027 ERROR ulimit - Splunk may not work due to low file size limit
03-03-2011 21:50:09.027 INFO  ulimit - Limit: open files: 1024
03-03-2011 21:50:09.027 INFO  ulimit - Limit: cpu time: unlimited
03-03-2011 21:50:09.029 INFO  loader - Splunkd starting (build 95063).
1 Solution

Splunk Employee
Splunk Employee

The culprit is your server ulimit. This is a classic problem with linux boxes, a real dedicated server needs higher limits.

To check your limits,

ulimit -a
# if you are running splunk under another user
su myuserrunningsplunk ulimit -a
# or restart splunk and check
grep ulimit $SPLUNK_HOME/var/log/splunk/splunkd.log

the 2 critical values are :

  • the file size (ulimit -f), because the size of a uncompressed bucket files can be very high. This is why splunk was complaining at launch or you.

  • the number of open files (ulimit -n), also named number of file descriptors. This one is very important in your case, because splunk is consuming a lot or file descriptors. Increase the value to at least 2048 or more (depending of your server capacity, I usually multiply by 10 to 10240 or 102400 or even unlimited for dedicated high-end production servers)

  • the number of user processes (ulimit -u) this is one if linked to the number of users and concurrent searches, it it is recommended to have more than 1024, 10000 is a good start

To have an estimation, consider that you may concurrently need a file descriptors for: every forwarder socket, every deployment client socket, each bucket can use 10 to 100 files, every search will consume up to 3, every file to be indexed, every user connected ...

How to change the file descriptor ulimit, by example to 10240 ?

  • temporary with a simple ulimit -n 10240 (this will not survive a server restart)
  • persistently edit users limits (on AIX and ubuntu) in /etc/security/limits.conf ( Hard limits are maintained by the kernel while the soft limits are enforced by the shell, here we use the soft < hard)
    splunkuser               hard   nofile          20240
    splunkuser               soft    nofile          10240
    

Don't forget to restart splunk after, and double check for ulimit in splunkd.log to confirm that the new value is detected (in particular if splunk is not running as root).


Remarks :

  • You may ultimately reach the system wide limit, that is usually much higher. see on debian and redhat in /etc/sysctl.conf fs.file-max
  • Don't run the deployment server in the same instance than your indexer, it will reduce your performances of splunkd. Instead, you should run it on another server, or at least in another instance.

View solution in original post

Splunk Employee
Splunk Employee

Re-load the new setting either by restarting the box or else issue command

/sbin/sysctl -p 

Communicator

This one is quite old - but still wrong - therefore the downvote.

sysctl -p-command load kernel settings from /etc/sysctl.conf. That's different from the user-specific settings in /etc/security/limits.conf, which are not read by this command. Instead one has to logoff and logon again.

0 Karma

Engager

Also remember for those running splunk as root, that root has to be explicitly listed in the /etc/security/limits.conf and will not match against the '*' wildcard. eg.

root soft nofile 10000
root hard nofile 20000

Communicator

At least today - e.g. in a Red Hat 7.6 environment - wildcard settings are used for the root user also.

0 Karma

Super Champion

Note about Ubuntu. On my 8.04 install, the "pam_limits" module wasn't enabled by default for the "su" command, and therefore the "splunk" init.d (startup/shutdown) script which prevented the appropriate limits from being applied. Here's what I had to do:

First, edit /etc/pam.d/su: You will need to add (or uncomment) the following line:

  session    required   pam_limits.so

Second, edit /etc/security/limits.conf: Add the following "nofiles" limit for the "splunk" user (or whatever user your splunkd process runs as)

splunk          soft    nofile          4096
splunk          hard    nofile          8196

Note, you may also want to enable "pam_limits.so" for "sudo" as well, if you ever use that tool too login interactively restart splunk services.

Update:

This appears to still be true for default Ubuntu 12.04 installs.

Splunk Employee
Splunk Employee

Amazing detective work.

0 Karma

Splunk Employee
Splunk Employee

The culprit is your server ulimit. This is a classic problem with linux boxes, a real dedicated server needs higher limits.

To check your limits,

ulimit -a
# if you are running splunk under another user
su myuserrunningsplunk ulimit -a
# or restart splunk and check
grep ulimit $SPLUNK_HOME/var/log/splunk/splunkd.log

the 2 critical values are :

  • the file size (ulimit -f), because the size of a uncompressed bucket files can be very high. This is why splunk was complaining at launch or you.

  • the number of open files (ulimit -n), also named number of file descriptors. This one is very important in your case, because splunk is consuming a lot or file descriptors. Increase the value to at least 2048 or more (depending of your server capacity, I usually multiply by 10 to 10240 or 102400 or even unlimited for dedicated high-end production servers)

  • the number of user processes (ulimit -u) this is one if linked to the number of users and concurrent searches, it it is recommended to have more than 1024, 10000 is a good start

To have an estimation, consider that you may concurrently need a file descriptors for: every forwarder socket, every deployment client socket, each bucket can use 10 to 100 files, every search will consume up to 3, every file to be indexed, every user connected ...

How to change the file descriptor ulimit, by example to 10240 ?

  • temporary with a simple ulimit -n 10240 (this will not survive a server restart)
  • persistently edit users limits (on AIX and ubuntu) in /etc/security/limits.conf ( Hard limits are maintained by the kernel while the soft limits are enforced by the shell, here we use the soft < hard)
    splunkuser               hard   nofile          20240
    splunkuser               soft    nofile          10240
    

Don't forget to restart splunk after, and double check for ulimit in splunkd.log to confirm that the new value is detected (in particular if splunk is not running as root).


Remarks :

  • You may ultimately reach the system wide limit, that is usually much higher. see on debian and redhat in /etc/sysctl.conf fs.file-max
  • Don't run the deployment server in the same instance than your indexer, it will reduce your performances of splunkd. Instead, you should run it on another server, or at least in another instance.

View solution in original post

Path Finder

I do not have the limits.conf file at the location /etc/security/limits.conf, how should I proceed?

When I run the command,

splunkuser hard nofile 20240

It says, its not a valid command.

Sorry, if this is an invalid question. I am new to this. Please help.

0 Karma

Path Finder

/!\ please read carefuly /!\

temporary with a simple ulimit -n 10240 (this will not survive a server restart)

So use ulimit -n 20240 ! 😉

0 Karma

Splunk Employee
Splunk Employee

In my experience with AWS deployments, I found the need to change the settings within /etc/security/limits.d
if I changed only the values in the limits.conf from /etc/security/ it gets overridden by limits.d contents.

make sure to check if any pre- determined limits were not set by AWS in limits.d

Hope it helps,

0 Karma

Splunk Employee
Splunk Employee

Splunk Employee
Splunk Employee

Builder

persistently edit users limits (on AIX and ubuntu) in /etc/security/limits.conf
( Hard limits are maintained by the kernel while the soft limits are enforced by the shell, here we use the soft < hard)

splunkuser hard nofile 20240
splunkuser soft nofile 10240

is not working after server reboot

0 Karma

Splunk Employee
Splunk Employee

deployment server has no link with this particular file descriptor issue.
ulimit is a system setting not a splunk setting.

0 Karma

Ultra Champion

Any tips on tuning the deployment server settings to reduce impact to performance in scenarios where an independent deployment server is not possible? I am asking specifically in this context as I am already familiar with settings for deployment server. I'm specifically curious about settings that will reduce issues that we get related to ulimit settings. Thanks for any tips!

0 Karma

Splunk Employee
Splunk Employee

remark : the answer was edited, we mentioned previously to edit /etc/sysctl.conf instead of /etc/security/limit.conf. You should not touch sysctl that has a higher default value already.

0 Karma

Communicator

This is much more stable with openfile= 8096 for me.
I also increased the filesize, on ext3 64bit I can go up to 2TB, 32bit seems limited to 2GB.
http://www.cyberciti.biz/tips/what-is-maximum-partition-size-supported-by-linux.html

0 Karma