As per the various other systemd related answers posts:
Is there a systemd unit file for Splunk?
Is there a systemd unit file for Splunk?
Splunk 7.2.2 - systemd - Root privileges required when starting/stopping Splunk?
There has been a lot of confusion since Splunk added the systemd start/stop option in Splunk 7.2.2
However systemd is required for the use of workload management so what are the options?
Duane's blogpost on Splunk 7.2.2 and systemd, does an excellent job of summarising the scenario and solutions.
So the question is, how do we get splunk to stop/start with the systemd unit file, without a modern version of polkit (which does not exist on any Redhat 7.x version), and without using the systemctl stop/start commands? In other words I want splunk stop/start to work as they did before the systemd unit file was in use...
On Duane's blogpost Splunk 7.2.2 and systemd, if you refer to "Better Polkit Changes", this is what I found was required on Redhat 7.5/7.6
The steps I used were:
splunk enable boot-start -user splunk
This creates a systemd unit file, I found it creates either Splunkd.service or a splunkd.service in the /etc/systemd/system/ directory
By default this unit file (as of Splunk 7.2.5) will result in Splunk been killed on shutdown/stop of Splunk, if you add these additional lines under the [Service] section of the unit file (credit to splunk support for this suggestion):
# Send $KillSignal only to main (splunkd) process, if any of the child processes is still alive after $TimeoutStopSec, SIGKILL them.
KillMode=mixed
# Splunk doesn't shutdown gracefully on SIGTERM
KillSignal=SIGINT
Then Splunk will shutdown correctly when stopping, if used with either systemctl stop Splunkd or systemctl stop splunkd depending on which Splunk version created the unit file.
If you want to run splunk stop you will need to create two files, a polkit rule file (Duane's github has an example)
polkit.addRule(function(action, subject) {
if (action.id == "org.freedesktop.systemd1.manage-units"
&& subject.user == "splunk") {
try {
polkit.spawn(["/usr/local/bin/polkit_splunk", ""+subject.pid]);
return polkit.Result.YES;
}
catch (error) {
return polkit.Result.AUTH_ADMIN;
}
}
});
This file will exist in /etc/polkit-1/rules.d/ , if you are running an OS with systemd 226 or newer you could alternatively use the "Polkit changes" on the blogpost as suggested by twinspop, if you are using Redhat 7.x please use the above.
In addition to the polkit file you will need to create the file /usr/local/bin/polkit_splunk (available in github)
The code will be:
#!/bin/bash -x
COMM=($(ps --no-headers -o cmd -p $1))
if [[ "${COMM[1]}" == "start" ]] ||
[[ "${COMM[1]}" == "stop" ]] ||
[[ "${COMM[1]}" == "restart" ]]; then
if [[ "${COMM[2]}" == "Splunkd" ]] ||
[[ "${COMM[2]}" == "Splunkd.service" ]]; then
exit 0
fi
fi
exit 1
Note you may need to change "Splunkd" with "splunkd" depending on your unit file name (which will match the $SPLUNK_HOME/etc/splunk-launch.conf SPLUNK_SERVER_NAME setting, you will also need to ensure execute permissions on the above:
chmod 755 /usr/local/bin/polkit_splunk
Once this is in place the splunk stop/start command works fine as the splunk user...
Furthermore the additional of KillMode=mixed, KillSignal=SIGINT means that splunk stop does not result in the splunk process been killed on shutdown.
Finally, for anyone using init.d on Redhat 7.4 or newer you may wish to test the following scenario:
On Oracle Linux 7.4/7.5 (based on Redhat 7.4/7.5) the above results in Splunk terminating on shutdown, the systemd unit files resolve this issue!
In addition to the above a suggestion from xpac is:
# Give Splunk time to shutdown - especially busy indexers can take time
TimeoutStopSec=10min
At least according to the documentation the default stop wait period appears to be 90 seconds before the SIGKILL is sent to the process, 10 minutes is a more reasonable time for a busy Splunk process to stop
Finally, in newer versions of systemd there is a new setting called TasksMax
, this setting defaults to 512 in some systemd versions and therefore will need to be increased within the systemd unit file for Splunk, refer to the systemd documentation for more information.
You can also set the ulimit settings within the systemd unit file to ensure they correctly apply on OS startup, here is an example including the TasksMax set to unlimited:
LimitCORE=0
LimitDATA=infinity
LimitNICE=0
LimitFSIZE=infinity
LimitSIGPENDING=385952
LimitMEMLOCK=65536
LimitRSS=infinity
LimitMSGQUEUE=819200
LimitRTPRIO=0
LimitSTACK=infinity
LimitCPU=infinity
LimitAS=infinity
LimitLOCKS=infinity
LimitNOFILE=1024000
LimitNPROC=512000
TasksMax=infinity
An additional note, for anyone upgrading from an older systemd-enabled Splunk UF or Enterprise server to Splunk 8 or newer please see Run Splunk Enterprise as a systemd service in particular:
If you configured Splunk Enterprise version 7.3.x or earlier to run as a systemd service, upon upgrade to version 8.0.0, on initial start, Splunk Enterprise modifies the existing systemd configuration as follows:
It removes the ExecStartPost and User properties from the Splunkd.service unit file.
It checks the systemd environment, identifies the cgroup path, and automatically sets permissions for the correct cgroup directories.
You can either update your systemd unit file or let Splunk attempt to do it for you (sudo splunk may work)
Has anyone found a good solution for resolving this issue on ubuntu systems?
Summary of the issue:
Splunk 6.0.0 - Splunk 7.2.1 defaults to using init.d when enabling boot start
Splunk 7.2.2 - Splunk 7.2.9 defaults to using systemd when enabling boot start
Splunk 7.3.0 - Splunk 8.x defaults to using init.d when enabling boot start
systemd defaults to prompting for root credentials upon stop/start/restart of Splunk
Here is a simple fix if you have encountered this issue and prefer to use the traditional init.d scripts vs systemd.
Splunk Enterprise/Heavy Forwarder example (note: replace the splunk user below with the account you run splunk as):
sudo /opt/splunk/bin/splunk disable boot-start
sudo /opt/splunk/bin/splunk enable boot-start -user splunk -systemd-managed 0
Splunk Universal Forwarder example (note: replace the splunk user below with the account you run splunk as):
sudo /opt/splunkforwarder/bin/splunk disable boot-start
sudo /opt/splunkforwarder/bin/splunk enable boot-start -user splunk -systemd-managed 0
Very nice writeup and excellent investigations. I think its also worth mentioning (for anyone else that finds this) that if you don't want to use SystemD you can use the old initd method still by using the flag: -systemd-managed 0
to the boot-start command. More info here: https://docs.splunk.com/Documentation/Splunk/latest/Admin/RunSplunkassystemdservice#Additional_optio...
Perhaps use:
https://docs.splunk.com/Documentation/Splunk/latest/Admin/RunSplunkassystemdservice#Additional_optio...
So you refer to the current version of the documentation?
Done thanks
On Duane's blogpost Splunk 7.2.2 and systemd, if you refer to "Better Polkit Changes", this is what I found was required on Redhat 7.5/7.6
The steps I used were:
splunk enable boot-start -user splunk
This creates a systemd unit file, I found it creates either Splunkd.service or a splunkd.service in the /etc/systemd/system/ directory
By default this unit file (as of Splunk 7.2.5) will result in Splunk been killed on shutdown/stop of Splunk, if you add these additional lines under the [Service] section of the unit file (credit to splunk support for this suggestion):
# Send $KillSignal only to main (splunkd) process, if any of the child processes is still alive after $TimeoutStopSec, SIGKILL them.
KillMode=mixed
# Splunk doesn't shutdown gracefully on SIGTERM
KillSignal=SIGINT
Then Splunk will shutdown correctly when stopping, if used with either systemctl stop Splunkd or systemctl stop splunkd depending on which Splunk version created the unit file.
If you want to run splunk stop you will need to create two files, a polkit rule file (Duane's github has an example)
polkit.addRule(function(action, subject) {
if (action.id == "org.freedesktop.systemd1.manage-units"
&& subject.user == "splunk") {
try {
polkit.spawn(["/usr/local/bin/polkit_splunk", ""+subject.pid]);
return polkit.Result.YES;
}
catch (error) {
return polkit.Result.AUTH_ADMIN;
}
}
});
This file will exist in /etc/polkit-1/rules.d/ , if you are running an OS with systemd 226 or newer you could alternatively use the "Polkit changes" on the blogpost as suggested by twinspop, if you are using Redhat 7.x please use the above.
In addition to the polkit file you will need to create the file /usr/local/bin/polkit_splunk (available in github)
The code will be:
#!/bin/bash -x
COMM=($(ps --no-headers -o cmd -p $1))
if [[ "${COMM[1]}" == "start" ]] ||
[[ "${COMM[1]}" == "stop" ]] ||
[[ "${COMM[1]}" == "restart" ]]; then
if [[ "${COMM[2]}" == "Splunkd" ]] ||
[[ "${COMM[2]}" == "Splunkd.service" ]]; then
exit 0
fi
fi
exit 1
Note you may need to change "Splunkd" with "splunkd" depending on your unit file name (which will match the $SPLUNK_HOME/etc/splunk-launch.conf SPLUNK_SERVER_NAME setting, you will also need to ensure execute permissions on the above:
chmod 755 /usr/local/bin/polkit_splunk
Once this is in place the splunk stop/start command works fine as the splunk user...
Furthermore the additional of KillMode=mixed, KillSignal=SIGINT means that splunk stop does not result in the splunk process been killed on shutdown.
Finally, for anyone using init.d on Redhat 7.4 or newer you may wish to test the following scenario:
On Oracle Linux 7.4/7.5 (based on Redhat 7.4/7.5) the above results in Splunk terminating on shutdown, the systemd unit files resolve this issue!
In addition to the above a suggestion from xpac is:
# Give Splunk time to shutdown - especially busy indexers can take time
TimeoutStopSec=10min
At least according to the documentation the default stop wait period appears to be 90 seconds before the SIGKILL is sent to the process, 10 minutes is a more reasonable time for a busy Splunk process to stop
Finally, in newer versions of systemd there is a new setting called TasksMax
, this setting defaults to 512 in some systemd versions and therefore will need to be increased within the systemd unit file for Splunk, refer to the systemd documentation for more information.
You can also set the ulimit settings within the systemd unit file to ensure they correctly apply on OS startup, here is an example including the TasksMax set to unlimited:
LimitCORE=0
LimitDATA=infinity
LimitNICE=0
LimitFSIZE=infinity
LimitSIGPENDING=385952
LimitMEMLOCK=65536
LimitRSS=infinity
LimitMSGQUEUE=819200
LimitRTPRIO=0
LimitSTACK=infinity
LimitCPU=infinity
LimitAS=infinity
LimitLOCKS=infinity
LimitNOFILE=1024000
LimitNPROC=512000
TasksMax=infinity
An additional note, for anyone upgrading from an older systemd-enabled Splunk UF or Enterprise server to Splunk 8 or newer please see Run Splunk Enterprise as a systemd service in particular:
If you configured Splunk Enterprise version 7.3.x or earlier to run as a systemd service, upon upgrade to version 8.0.0, on initial start, Splunk Enterprise modifies the existing systemd configuration as follows:
It removes the ExecStartPost and User properties from the Splunkd.service unit file.
It checks the systemd environment, identifies the cgroup path, and automatically sets permissions for the correct cgroup directories.
You can either update your systemd unit file or let Splunk attempt to do it for you (sudo splunk may work)
If all this is known, why doesn't Splunk add it to what it does when you enable boot-start in the first place?
Worked like charm!!
FYI: make sure the file created under /etc/polkit/rules.d/{} is having permissions 644.