We are running a distributed Splunk appication on CentOS 7.3. Please find below our unit file:
[Unit]
Description=Splunk Enterprise 6.5.2
After=network.target
Wants=network.target
[Service]
Type=forking
RemainAfterExit=False
User=root
Group=root
LimitNOFILE=65536
ExecStart=/opt/splunk/bin/splunk start --accept-license --answer-yes --no-prompt
ExecStop=/opt/splunk/bin/splunk stop
ExecReload=/opt/splunk/bin/splunk restart
PIDFile=/opt/splunk/var/run/splunk/splunkd.pid
[Install]
WantedBy=multi-user.target
When we start/restart Splunk in our nodes through systemctl for the first time, the service starts and then stops in a short while. Starting splunk after that works correctly. What are we missing here? Should we be making additional changes to the unit file?
Thanks in advance,
Keerthana
I have the same kind of issue when I'm using systemd here.
On my side, it's happens when I'm adding an indexer (or a search head) into a cluster.
When an indexer join a cluster, the cluster master will send a configuration bundle and will ask splunk to restart.
It seems with systemd, splunk stop properly but does not start again after.
You may want to add something like that into the unit file:
Restart=on-failure
RestartSec=30s
But you will be forced to use systemctl to stop splunk (if not, systemctl will start it again after 30s).
I'm still looking for another solution, maybe someone else can help here.
Thanks.
Summary of the issue:
Splunk 6.0.0 - Splunk 7.2.1 defaults to using init.d when enabling boot start
Splunk 7.2.2 - Splunk 7.2.9 defaults to using systemd when enabling boot start
Splunk 7.3.0 - Splunk 8.x defaults to using init.d when enabling boot start
systemd defaults to prompting for root credentials upon stop/start/restart of Splunk
Here is a simple fix if you have encountered this issue and prefer to use the traditional init.d scripts vs systemd.
Splunk Enterprise/Heavy Forwarder example (note: replace the splunk user below with the account you run splunk as):
sudo /opt/splunk/bin/splunk disable boot-start
sudo /opt/splunk/bin/splunk enable boot-start -user splunk -systemd-managed 0
Splunk Universal Forwarder example (note: replace the splunk user below with the account you run splunk as):
sudo /opt/splunkforwarder/bin/splunk disable boot-start
sudo /opt/splunkforwarder/bin/splunk enable boot-start -user splunk -systemd-managed 0
Also refer to https://answers.splunk.com/answers/738877/splunk-systemd-unit-file-in-versions-722-and-newer.html for a detailed discussion around systemd and Splunk starting without the systemctl command
I've encountered this previously, especially on v7.2.x.
One thing you need to change in your unit file is the type.
Set Type=simple in your [Service] stanza, instead of forking.
Also, check the SPLUNK_SERVER_NAME setting in /opt/splunk/etc/splunk-launch.conf
# SPLUNK_OS_USER
#SPLUNK_SERVER_NAME=Splunkd
SPLUNK_OS_USER=splunk
If the value is set there, comment it out and cycle Splunk. The setting there overrides what is in server.conf and causes issues.
Also in 7.2.x, if you run splunk enable boot-start it will generate a properly formed systemd unit file.
Be sure to follow that up with enable and start.
/opt/splunk/bin/splunk enable boot-start -user root -systemd-managed 1
systemctl enable Splunkd.service
systemctl restart Splunkd.service
Hi @codebuilder
Above you say:
"Also, check the SPLUNK_SERVER_NAME setting in /opt/splunk/etc/splunk-launch.conf
...
If the value is set there, comment it out and cycle Splunk. The setting there overrides what is in server.conf and causes issues."
Can you clarify which setting in server.conf is overriden? Or shed any further light on what exactly is happening here? I'm trying to better understand this behavior for Splunk doc purposes. thanks!
Sure, this can be a bit misleading.
The SPLUNK_SERVER_NAME value contained in splunk-launch.conf does not refer to the name of the server itself as you see in server.conf. Instead SPLUNK_SERVER_NAME is the name of the Splunk process/daemon that runs ON your server.
Changing SPLUNK_SERVER_NAME in splunk-launch.conf will have no effect on parameters within server.conf.
Hope this helps.
Apologies, after reading my initial reply I can see that I likely caused the confusion.
To clarify again, changing splunk-launch.conf does not modify server.conf.
Below is the unit file generated by Splunk 7.2.6
#This unit file replaces the traditional start-up script for systemd
#configurations, and is used when enabling boot-start for Splunk on
#systemd-based Linux distributions.
[Unit]
Description=Systemd service file for Splunk, generated by 'splunk enable boot-start'
After=network.target
[Service]
Type=simple
Restart=always
ExecStart=/opt/splunk/bin/splunk _internal_launch_under_systemd
LimitNOFILE=65536
SuccessExitStatus=51 52
RestartPreventExitStatus=51
RestartForceExitStatus=52
User=splunk
Delegate=true
MemoryLimit=100G
CPUShares=1024
KillMode=mixed
KillSignal=SIGINT
TimeoutStopSec=10min
PermissionsStartOnly=true
ExecStartPost=/bin/bash -c "chown -R splunk:splunk /sys/fs/cgroup/cpu/system.slice/%n"
ExecStartPost=/bin/bash -c "chown -R splunk:splunk /sys/fs/cgroup/memory/system.slice/%n"
[Install]
WantedBy=multi-user.target
As per https://answers.splunk.com/answers/738877/splunk-systemd-unit-file-in-versions-722-and-newer.html you may want to add in TasksMax, and also possibly the ulimit settings.
Finally, this service would assume transparent huge pages are disabled...
My two pennies worth.
I have the Debian package installed at home lab and it seems to use systemd as default now. If I restart splunkd as my install user (which is called siem), I am prompted for root password, then a message says I have to restart as root using systemctl. Back when I used init instead it was important to restart splunk as the installation user, siem, otherwise splunk would not start properly, I think because somewhere under the installation tree under /opt/splunk, ownership of a file had changed, (lock file?). I say that because a "chown -R siem:siem /opt/splunk" fixed that issue and siem user could restart splunk again. This is a common issue for us in production and was caused by others upgrading systems and the way they shutdown and start the services, being none the wiser that this would then cause an issue with the Splunk installation. (These are rpm based systems still using init)
Similar issue if someone installs splunk as the default user (splunk), siem user could not start splunk until "chown -R siem:siem /opt/splunk"
So I wonder if systemd is causing a similar issue, as it appears to be forcing the Splunk service to be started as root and not the user that splunk was installed under. And if remotely restarting, perhaps a prompt for root password is not being seen, so Splunk cannot restart? Maybe an expect script over ssh a remote solution? but not ideal.
Maybe sudo is the answer, but that will be a whole lot of servers to manage, does not fit in with the companies security policy, and getting root password is an absolute pain procedure wise. We run a tight ship. I'm hoping I can force a legacy startup until splunk can advise how to install Splunk Enterprise under a specific user and be able to restart Splunk when we need to as that user. Otherwise Splunk just becomes a lump painting us into a corner.
Fortunately we are still using init in production, I hope it stays that way.
Refer to https://answers.splunk.com/answers/738877/splunk-systemd-unit-file-in-versions-722-and-newer.html for a more detailed answer. You can use init.d as per chrisyounger's answer on the post.
Or you can get systemd + Splunk working nicely on most modern OS'es
I am running Splunk 7.0.2 on RHEL 7.4 and use the following splunk.service for systemd.
[Unit]
Description=Splunk Enterprise 7.0.2
After=network.target
Wants=network.target
[Service]
Type=forking
User=splunk
Group=splunk
LimitNOFILE=65536
ExecStart=/opt/splunk/bin/splunk start
ExecStop=/opt/splunk/bin/splunk stop
ExecReload=/opt/splunk/bin/splunk restart
PIDFile=/opt/splunk/var/run/splunk/splunkd.pid
[Install]
WantedBy=multi-user.target
# If you want to use $(systemctl [start|stop|restart] splunk) instead of splunkd ...
Alias=splunk.service
This runs splunk as the splunk user so you need to ensure that splunk owns all the files in your $SPLUNK_HOME dir.
This works fine when all the splunk ports are above 1024.
I tried with your file (I had almost the same) and I have the same result.
To reproduce the issue, Splunk need to be started with systemctl (it work if you do a /opt/splunk/bin/splunk start) and you can try to restart Splunk from the web interface.
Splunk will shutdown but will not restart.
I have the same kind of issue when I'm using systemd here.
On my side, it's happens when I'm adding an indexer (or a search head) into a cluster.
When an indexer join a cluster, the cluster master will send a configuration bundle and will ask splunk to restart.
It seems with systemd, splunk stop properly but does not start again after.
You may want to add something like that into the unit file:
Restart=on-failure
RestartSec=30s
But you will be forced to use systemctl to stop splunk (if not, systemctl will start it again after 30s).
I'm still looking for another solution, maybe someone else can help here.
Thanks.
I have researched this issue for a very long time. So far there isn't a perfect solution still:
The only way to get it to reliably restart is to set
Restart= always
Ironically, this seems to be also the recommended setting in splunk's systemctl config with splunkd binary support.
The downside is that you will not be able to control start/stop using splunk binary.
@kundeng I've used my own instructions https://answers.splunk.com/answers/738877/splunk-systemd-unit-file-in-versions-722-and-newer.html for 7.3.x and you can also refer to https://docs.splunk.com/Documentation/Splunk/latest/Admin/RunSplunkassystemdservice
"The downside is that you will not be able to control start/stop using splunk binary. "
That's not true, as per my instructions I have it working as do others using the splunk binary.
Hi,
what is not true? Can you be specific? Thanks.
What I meant is when I have that specific settings (restart=always), "splunk stop" won't stop splunk.
It seems that you have created some kind of polkit changes to get round systemd enabled splunk asking for password. I am sure it is possible for certain distributions of Linux and will definitely look into it. Note that is not an official solution from splunk as far as I know, having dealt with many splunk consultants whose suggestions are always "don't enable it".
My frustration is that I haven't seen any other software so frustrating like splunk in terms of start/stop. The best direction going forward is for splunk to fix the damn issue, as we are paying them big bucks. No?
The specific point is that with systemd enabled, splunk stop and splunk start work just fine as a non-root user (the splunk user) on my servers.
If you can describe the issue well then raise an idea on ideas.splunk.com/ and vote for it! The documentation for ideas is worth a read if you haven't used the site before
did you try ./splunk enable boot-start?
That didn't quite work so we are using systemctl enable splunk
instead.
When Splunk stops in a short while, what is in splunkd.log?