Deployment Architecture

Systemd start restart for splunk not working as expected (CentOS 7.3).

keerthana_k
Communicator

We are running a distributed Splunk appication on CentOS 7.3. Please find below our unit file:

[Unit]
Description=Splunk Enterprise 6.5.2
After=network.target
Wants=network.target

[Service]
Type=forking
RemainAfterExit=False
User=root
Group=root
LimitNOFILE=65536
ExecStart=/opt/splunk/bin/splunk start --accept-license --answer-yes --no-prompt
ExecStop=/opt/splunk/bin/splunk stop
ExecReload=/opt/splunk/bin/splunk restart
PIDFile=/opt/splunk/var/run/splunk/splunkd.pid

[Install]
WantedBy=multi-user.target

When we start/restart Splunk in our nodes through systemctl for the first time, the service starts and then stops in a short while. Starting splunk after that works correctly. What are we missing here? Should we be making additional changes to the unit file?

Thanks in advance,
Keerthana

Labels (1)
1 Solution

groland
Explorer

I have the same kind of issue when I'm using systemd here.
On my side, it's happens when I'm adding an indexer (or a search head) into a cluster.

When an indexer join a cluster, the cluster master will send a configuration bundle and will ask splunk to restart.
It seems with systemd, splunk stop properly but does not start again after.

You may want to add something like that into the unit file:
Restart=on-failure
RestartSec=30s

But you will be forced to use systemctl to stop splunk (if not, systemctl will start it again after 30s).

I'm still looking for another solution, maybe someone else can help here.

Thanks.

View solution in original post

bandit
Motivator

Summary of the issue:
Splunk 6.0.0 - Splunk 7.2.1 defaults to using init.d when enabling boot start
Splunk 7.2.2 - Splunk 7.2.9 defaults to using systemd when enabling boot start
Splunk 7.3.0 - Splunk 8.x defaults to using init.d when enabling boot start

systemd defaults to prompting for root credentials upon stop/start/restart of Splunk

Here is a simple fix if you have encountered this issue and prefer to use the traditional init.d scripts vs systemd.

Splunk Enterprise/Heavy Forwarder example (note: replace the splunk user below with the account you run splunk as):

sudo /opt/splunk/bin/splunk disable boot-start
sudo /opt/splunk/bin/splunk enable boot-start -user splunk -systemd-managed 0

Splunk Universal Forwarder example (note: replace the splunk user below with the account you run splunk as):

sudo /opt/splunkforwarder/bin/splunk disable boot-start
sudo /opt/splunkforwarder/bin/splunk enable boot-start -user splunk -systemd-managed 0

gjanders
SplunkTrust
SplunkTrust
0 Karma

codebuilder
Influencer

I've encountered this previously, especially on v7.2.x.

One thing you need to change in your unit file is the type.
Set Type=simple in your [Service] stanza, instead of forking.

Also, check the SPLUNK_SERVER_NAME setting in /opt/splunk/etc/splunk-launch.conf

# SPLUNK_OS_USER
#SPLUNK_SERVER_NAME=Splunkd
SPLUNK_OS_USER=splunk

If the value is set there, comment it out and cycle Splunk. The setting there overrides what is in server.conf and causes issues.

Also in 7.2.x, if you run splunk enable boot-start it will generate a properly formed systemd unit file.
Be sure to follow that up with enable and start.

/opt/splunk/bin/splunk enable boot-start -user root -systemd-managed 1
systemctl enable Splunkd.service
systemctl restart Splunkd.service
----
An upvote would be appreciated and Accept Solution if it helps!

sroback_splunk
Splunk Employee
Splunk Employee

Hi @codebuilder

Above you say:

"Also, check the SPLUNK_SERVER_NAME setting in /opt/splunk/etc/splunk-launch.conf
...
If the value is set there, comment it out and cycle Splunk. The setting there overrides what is in server.conf and causes issues."

Can you clarify which setting in server.conf is overriden? Or shed any further light on what exactly is happening here? I'm trying to better understand this behavior for Splunk doc purposes. thanks!

0 Karma

codebuilder
Influencer

Sure, this can be a bit misleading.
The SPLUNK_SERVER_NAME value contained in splunk-launch.conf does not refer to the name of the server itself as you see in server.conf. Instead SPLUNK_SERVER_NAME is the name of the Splunk process/daemon that runs ON your server.

Changing SPLUNK_SERVER_NAME in splunk-launch.conf will have no effect on parameters within server.conf.

Hope this helps.

----
An upvote would be appreciated and Accept Solution if it helps!

codebuilder
Influencer

Apologies, after reading my initial reply I can see that I likely caused the confusion.
To clarify again, changing splunk-launch.conf does not modify server.conf.

----
An upvote would be appreciated and Accept Solution if it helps!

codebuilder
Influencer

Below is the unit file generated by Splunk 7.2.6

#This unit file replaces the traditional start-up script for systemd
#configurations, and is used when enabling boot-start for Splunk on
#systemd-based Linux distributions.

[Unit]
Description=Systemd service file for Splunk, generated by 'splunk enable boot-start'
After=network.target

[Service]
Type=simple
Restart=always
ExecStart=/opt/splunk/bin/splunk _internal_launch_under_systemd
LimitNOFILE=65536
SuccessExitStatus=51 52
RestartPreventExitStatus=51
RestartForceExitStatus=52
User=splunk
Delegate=true
MemoryLimit=100G
CPUShares=1024
KillMode=mixed
KillSignal=SIGINT
TimeoutStopSec=10min
PermissionsStartOnly=true
ExecStartPost=/bin/bash -c "chown -R splunk:splunk /sys/fs/cgroup/cpu/system.slice/%n"
ExecStartPost=/bin/bash -c "chown -R splunk:splunk /sys/fs/cgroup/memory/system.slice/%n"

[Install]
WantedBy=multi-user.target
----
An upvote would be appreciated and Accept Solution if it helps!
0 Karma

gjanders
SplunkTrust
SplunkTrust

As per https://answers.splunk.com/answers/738877/splunk-systemd-unit-file-in-versions-722-and-newer.html you may want to add in TasksMax, and also possibly the ulimit settings.

Finally, this service would assume transparent huge pages are disabled...

neilhaywood
Engager

My two pennies worth.

I have the Debian package installed at home lab and it seems to use systemd as default now. If I restart splunkd as my install user (which is called siem), I am prompted for root password, then a message says I have to restart as root using systemctl. Back when I used init instead it was important to restart splunk as the installation user, siem, otherwise splunk would not start properly, I think because somewhere under the installation tree under /opt/splunk, ownership of a file had changed, (lock file?). I say that because a "chown -R siem:siem /opt/splunk" fixed that issue and siem user could restart splunk again. This is a common issue for us in production and was caused by others upgrading systems and the way they shutdown and start the services, being none the wiser that this would then cause an issue with the Splunk installation. (These are rpm based systems still using init)

Similar issue if someone installs splunk as the default user (splunk), siem user could not start splunk until "chown -R siem:siem /opt/splunk"

So I wonder if systemd is causing a similar issue, as it appears to be forcing the Splunk service to be started as root and not the user that splunk was installed under. And if remotely restarting, perhaps a prompt for root password is not being seen, so Splunk cannot restart? Maybe an expect script over ssh a remote solution? but not ideal.

Maybe sudo is the answer, but that will be a whole lot of servers to manage, does not fit in with the companies security policy, and getting root password is an absolute pain procedure wise. We run a tight ship. I'm hoping I can force a legacy startup until splunk can advise how to install Splunk Enterprise under a specific user and be able to restart Splunk when we need to as that user. Otherwise Splunk just becomes a lump painting us into a corner.

Fortunately we are still using init in production, I hope it stays that way.

0 Karma

gjanders
SplunkTrust
SplunkTrust

Refer to https://answers.splunk.com/answers/738877/splunk-systemd-unit-file-in-versions-722-and-newer.html for a more detailed answer. You can use init.d as per chrisyounger's answer on the post.

Or you can get systemd + Splunk working nicely on most modern OS'es

0 Karma

neilsquires
Engager

I am running Splunk 7.0.2 on RHEL 7.4 and use the following splunk.service for systemd.

[Unit]
Description=Splunk Enterprise 7.0.2
After=network.target
Wants=network.target

[Service]
Type=forking
User=splunk
Group=splunk
LimitNOFILE=65536
ExecStart=/opt/splunk/bin/splunk start
ExecStop=/opt/splunk/bin/splunk stop
ExecReload=/opt/splunk/bin/splunk restart
PIDFile=/opt/splunk/var/run/splunk/splunkd.pid

[Install]
WantedBy=multi-user.target
# If you want to use $(systemctl [start|stop|restart] splunk) instead of splunkd ...
Alias=splunk.service

This runs splunk as the splunk user so you need to ensure that splunk owns all the files in your $SPLUNK_HOME dir.

This works fine when all the splunk ports are above 1024.

groland
Explorer

I tried with your file (I had almost the same) and I have the same result.
To reproduce the issue, Splunk need to be started with systemctl (it work if you do a /opt/splunk/bin/splunk start) and you can try to restart Splunk from the web interface.

Splunk will shutdown but will not restart.

groland
Explorer

I have the same kind of issue when I'm using systemd here.
On my side, it's happens when I'm adding an indexer (or a search head) into a cluster.

When an indexer join a cluster, the cluster master will send a configuration bundle and will ask splunk to restart.
It seems with systemd, splunk stop properly but does not start again after.

You may want to add something like that into the unit file:
Restart=on-failure
RestartSec=30s

But you will be forced to use systemctl to stop splunk (if not, systemctl will start it again after 30s).

I'm still looking for another solution, maybe someone else can help here.

Thanks.

kundeng
Path Finder

I have researched this issue for a very long time. So far there isn't a perfect solution still:

  1. linux system with only systemd (and auto routing to /etc/init.d script) support, but without actual sysV Restart= on-failure will cause splunk to stop but not restart when an internal splunk start is invoked, in situtations like rolling-restart etc.

The only way to get it to reliably restart is to set
Restart= always

Ironically, this seems to be also the recommended setting in splunk's systemctl config with splunkd binary support.

The downside is that you will not be able to control start/stop using splunk binary.

  1. using splunk splunkd systemctl support.
    This was default in 7.2 and then removed as default >=7.3 I was told by PS not to enable due to other issues. One issue is of course that you need (can get around) root password to start/stop.
0 Karma

gjanders
SplunkTrust
SplunkTrust

@kundeng I've used my own instructions https://answers.splunk.com/answers/738877/splunk-systemd-unit-file-in-versions-722-and-newer.html for 7.3.x and you can also refer to https://docs.splunk.com/Documentation/Splunk/latest/Admin/RunSplunkassystemdservice

"The downside is that you will not be able to control start/stop using splunk binary. "

That's not true, as per my instructions I have it working as do others using the splunk binary.

0 Karma

kundeng
Path Finder

Hi,
what is not true? Can you be specific? Thanks.

What I meant is when I have that specific settings (restart=always), "splunk stop" won't stop splunk.

It seems that you have created some kind of polkit changes to get round systemd enabled splunk asking for password. I am sure it is possible for certain distributions of Linux and will definitely look into it. Note that is not an official solution from splunk as far as I know, having dealt with many splunk consultants whose suggestions are always "don't enable it".

My frustration is that I haven't seen any other software so frustrating like splunk in terms of start/stop. The best direction going forward is for splunk to fix the damn issue, as we are paying them big bucks. No?

0 Karma

gjanders
SplunkTrust
SplunkTrust

The specific point is that with systemd enabled, splunk stop and splunk start work just fine as a non-root user (the splunk user) on my servers.

If you can describe the issue well then raise an idea on ideas.splunk.com/ and vote for it! The documentation for ideas is worth a read if you haven't used the site before

0 Karma

mayurr98
Super Champion

did you try ./splunk enable boot-start?

0 Karma

keerthana_k
Communicator

That didn't quite work so we are using systemctl enable splunk instead.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

When Splunk stops in a short while, what is in splunkd.log?

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...