Installation

RHEL/CentOS 7 & systemD not honoring ulimits

jethompson_splu
Splunk Employee
Splunk Employee

It's been reported by a couple of customers that they are seeing ulimits not getting honored when Splunk starts at Boot on Servers running RHEL 7 or CentOS 7 that are using systemD.

So to provide a little insight on what is happening during the Boot Process when Splunk does not get the ulimit settings applied, this is due to how the Linux Kernel is loading the Splunk Process. Typically when this situation arises its due to how the User Splunk is running as was created. Usually this issue arises when the Splunk Service account does not have Login Permissions. As the User that is running Splunk does not have Login Permissions the PAM Libraries are not called when the Splunk Process is started. This in turn causes ulimits that are set on the Server to not get applied to the Splunk Process. This is known behavior and is expected behavior of the Linux Kernel.

1 Solution

jethompson_splu
Splunk Employee
Splunk Employee

Don't worry, there is a "work around" available to correct this known behavior.

You will first need to identify how the Splunk Process is being started on your RHEL/CentOS 7 Server. The following are the steps to identify how your Splunk Service is being started (init.d or systemctl):

Identify if systemctl or service is in use:

service splunk status
or
systemctl status splunk.service

Based on the Printout provided by these commands depends on what Script or Configuration file needs to be modified to resolve this type of issue. The following is a printout from a CentOS 7 Server that is using systemctl and the "OId" init.d Script:

]$ systemctl status splunk.service
● splunk.service - SYSV: Splunk indexer service
   Loaded: loaded (/etc/rc.d/init.d/splunk; bad; vendor preset: disabled)
   Active: active (exited) since Thu 2017-10-26 07:46:59 CDT; 2 weeks 6 days ago
     Docs: man:systemd-sysv-generator(8)

As this printout is showing the use of the init.d Splunk Script the modification needed would be to: /etc/init.d/splunk and the modification would be to add the specific ulimits to the Startup Script. The following would be adding the NoFile (Number of File Descriptors) ulimit to the Splunk Startup Script. These would be placed inside of the "Starting" Function of the script:

ulimit -Hn 20240
ulimit -Sn 10240

The above ulimit values would need to be added to the Start Function like so:

splunk_start() {
  echo Starting Splunk...
  ulimit -Hn 20240
  ulimit -Sn 10240

This will force the Linux Kernel to apply the ulimit settings for the process as its starting.

Now if you are running RHEL/CentOS 7 and using a systemD unit file the modification is very similar. To identify where your systemD unit file is located for the Splunk Process you would issue the following command set:

systemctl status splunk.service

This will provide a printout like the one posted previously in this Post. It will provide the location of the unit file being used to load the Splunk Process. This is usually located in: /etc/systemd/system -- but could be located in a couple of different locations based on your installation and configuration of the OS. The following is an example of the unit file for Splunk on a systemD Server:

]$ systemctl status splunk.service
● splunk.service - SYSV: Splunk indexer service
   Loaded: loaded (/etc/systemd/system/splunk.service; bad; vendor preset: disabled)
   Active: active (exited) since Thu 2017-10-26 07:46:59 CDT; 2 weeks 6 days ago

The following is an example of a systemD unit file prior to the ulimit modifications:

[Unit]
Description=Splunk indexer service
Wants=network.target
After=network.target
Requires=thp-disable.service

[Service]
Type=forking
Restart=on-failure
User=splunk
Group=splunk
ExecStart=/opt/splunk/bin/splunk start
ExecStop=/opt/splunk/bin/splunk stop
ExecReload=/opt/splunk/bin/splunk restart
StandardOutput=syslog
TimeoutSec=300
PIDFile=/opt/splunk/var/run/splunk/splunkd.pid

[Install]
WantedBy=multi-user.target

To correct the issue with ulimits not being honored for a Server that is using systemD you would want to add the following configuration options to the systemD unit file:

LimitNOFILE="NUMBER"
LimitNPROC="NUMBER"
LimitDATA="NUMBER"
LimitFSIZE="NUMBER"

After making the needed modification to allow ulimits to be honored you would have a file that looks like the following:

[Unit]
Description=Splunk indexer service
Wants=network.target
After=network.target
Requires=thp-disable.service

[Service]
Type=forking
Restart=on-failure
User=splunk
Group=splunk
ExecStart=/opt/splunk/bin/splunk start
ExecStop=/opt/splunk/bin/splunk stop
ExecReload=/opt/splunk/bin/splunk restart
StandardOutput=syslog
LimitNOFILE=65535
LimitNPROC=16384
LimitDATA=85256585
LimitFSIZE=632524157485
TimeoutSec=300
PIDFile=/opt/splunk/var/run/splunk/splunkd.pid

[Install]
WantedBy=multi-user.target

Now of course there will be some difference between the configuration files listed in this Answers Post and the Splunk recommended ulimit sizes and I highly recommend you use the following Splunk Documentation to determine the appropriate ulimit sizes:

http://docs.splunk.com/Documentation/Splunk/7.0.0/Installation/Systemrequirements#Considerations_reg...

Be sure to verify against the version of Splunk being used as I posted the link to the "Current" Version of Splunk. With the changes mentioned in this Answers Post you will resolve any ulimit not being honored when Splunk is started during the Boot-Process.

View solution in original post

gjanders
SplunkTrust
SplunkTrust
0 Karma

Sparky1
Explorer

Once this ulimit is set in the /etc/init.d/splunk does it also need to be set in the /etc/security/limits.conf file to reflect the changes?

0 Karma

jethompson_splu
Splunk Employee
Splunk Employee

@Sparky1 To address your question in regards to if you set the ulimit settings inside of the /etc/init.d/splunk Script, you should not need to modify /etc/security/limits.conf for the User Account Running Splunk.

By setting the ulimits inside of the init.d Script or unit file you are essentially "Hard Coding" the ulimits that the Splunk Process is to take when the Start or Restart Commands are issued and therefore no longer need to update the System Ulimit Settings for a User Account.

There are situations where Central Authentication is used or the Splunk Service account is not provided with Login Permissions, and for these situations the System ulimits (/etc/security/limits.conf) may not be applied during the starting of a Process at Boot Time. This is "Expected" Behavior from the Linux OS and this is due to the PAM Libraries not being called for a User that does not have Login Permissions or the user is part of a Central Authentication System that does not load prior to the Splunk Service starting.

In this type of situation the /etc/security/limits.conf configuration file does not get applied and as such you have to "Hard Code" the ulimits inside of the init.d Script or unit file.

Hopefully this answers your question.

0 Karma

jethompson_splu
Splunk Employee
Splunk Employee

Don't worry, there is a "work around" available to correct this known behavior.

You will first need to identify how the Splunk Process is being started on your RHEL/CentOS 7 Server. The following are the steps to identify how your Splunk Service is being started (init.d or systemctl):

Identify if systemctl or service is in use:

service splunk status
or
systemctl status splunk.service

Based on the Printout provided by these commands depends on what Script or Configuration file needs to be modified to resolve this type of issue. The following is a printout from a CentOS 7 Server that is using systemctl and the "OId" init.d Script:

]$ systemctl status splunk.service
● splunk.service - SYSV: Splunk indexer service
   Loaded: loaded (/etc/rc.d/init.d/splunk; bad; vendor preset: disabled)
   Active: active (exited) since Thu 2017-10-26 07:46:59 CDT; 2 weeks 6 days ago
     Docs: man:systemd-sysv-generator(8)

As this printout is showing the use of the init.d Splunk Script the modification needed would be to: /etc/init.d/splunk and the modification would be to add the specific ulimits to the Startup Script. The following would be adding the NoFile (Number of File Descriptors) ulimit to the Splunk Startup Script. These would be placed inside of the "Starting" Function of the script:

ulimit -Hn 20240
ulimit -Sn 10240

The above ulimit values would need to be added to the Start Function like so:

splunk_start() {
  echo Starting Splunk...
  ulimit -Hn 20240
  ulimit -Sn 10240

This will force the Linux Kernel to apply the ulimit settings for the process as its starting.

Now if you are running RHEL/CentOS 7 and using a systemD unit file the modification is very similar. To identify where your systemD unit file is located for the Splunk Process you would issue the following command set:

systemctl status splunk.service

This will provide a printout like the one posted previously in this Post. It will provide the location of the unit file being used to load the Splunk Process. This is usually located in: /etc/systemd/system -- but could be located in a couple of different locations based on your installation and configuration of the OS. The following is an example of the unit file for Splunk on a systemD Server:

]$ systemctl status splunk.service
● splunk.service - SYSV: Splunk indexer service
   Loaded: loaded (/etc/systemd/system/splunk.service; bad; vendor preset: disabled)
   Active: active (exited) since Thu 2017-10-26 07:46:59 CDT; 2 weeks 6 days ago

The following is an example of a systemD unit file prior to the ulimit modifications:

[Unit]
Description=Splunk indexer service
Wants=network.target
After=network.target
Requires=thp-disable.service

[Service]
Type=forking
Restart=on-failure
User=splunk
Group=splunk
ExecStart=/opt/splunk/bin/splunk start
ExecStop=/opt/splunk/bin/splunk stop
ExecReload=/opt/splunk/bin/splunk restart
StandardOutput=syslog
TimeoutSec=300
PIDFile=/opt/splunk/var/run/splunk/splunkd.pid

[Install]
WantedBy=multi-user.target

To correct the issue with ulimits not being honored for a Server that is using systemD you would want to add the following configuration options to the systemD unit file:

LimitNOFILE="NUMBER"
LimitNPROC="NUMBER"
LimitDATA="NUMBER"
LimitFSIZE="NUMBER"

After making the needed modification to allow ulimits to be honored you would have a file that looks like the following:

[Unit]
Description=Splunk indexer service
Wants=network.target
After=network.target
Requires=thp-disable.service

[Service]
Type=forking
Restart=on-failure
User=splunk
Group=splunk
ExecStart=/opt/splunk/bin/splunk start
ExecStop=/opt/splunk/bin/splunk stop
ExecReload=/opt/splunk/bin/splunk restart
StandardOutput=syslog
LimitNOFILE=65535
LimitNPROC=16384
LimitDATA=85256585
LimitFSIZE=632524157485
TimeoutSec=300
PIDFile=/opt/splunk/var/run/splunk/splunkd.pid

[Install]
WantedBy=multi-user.target

Now of course there will be some difference between the configuration files listed in this Answers Post and the Splunk recommended ulimit sizes and I highly recommend you use the following Splunk Documentation to determine the appropriate ulimit sizes:

http://docs.splunk.com/Documentation/Splunk/7.0.0/Installation/Systemrequirements#Considerations_reg...

Be sure to verify against the version of Splunk being used as I posted the link to the "Current" Version of Splunk. With the changes mentioned in this Answers Post you will resolve any ulimit not being honored when Splunk is started during the Boot-Process.

DalJeanis
Legend

@jethompson_splunk - If possible, could you move the answer portion to an answer and accept it, so that the question will show as closed? Thanks, dmj.

0 Karma

nnmiller
Contributor

Another method for the init script, ignoring the systemd aspect, requiring your version of su on Linux to have libpam.so linked, is to replace:

"/opt/splunk/bin/splunk" start --no-prompt --answer-yes

with the following, replacing <splunk_user> with the correct user for your environment:

su <splunk_user> -c '/opt/splunk/bin/splunk start --no-prompt --answer-yes'

So far I've check RHEL 6 and CentOS 6 and both include the PAM library in the su binary.

0 Karma

saurabh_tek11
Communicator

So where is the question. Seems like this is blog port but a really helpful one 🙂
double points!?>

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In September, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...