Installation

Splunk doesn't start on cgroup2 only system

daubsi_2
Explorer

I have installed Splunk on a cgroup1/2 hybrid system using "enable boot-start systemd-managed 1" to start it on bootup.

Yesterday I switched to a cgroup2 only system by disabling the usage of cgroup1 via grub/kernel boot parameters.

Now splunk doesn't start anymore due to a file in the cgroup1 file system hierarchy no longer been present:

 

 

Jan 22 10:25:54 bigigloo systemd[1]: Stopping Systemd service file for Splunk, generated by 'splunk enable boot-start'...
Jan 22 10:30:58 bigigloo systemd[1]: Splunkd.service: Killing process 2847689 (python3.7) with signal SIGKILL.
Jan 22 10:30:58 bigigloo systemd[1]: Splunkd.service: Succeeded.
Jan 22 10:30:58 bigigloo systemd[1]: Stopped Systemd service file for Splunk, generated by 'splunk enable boot-start'.
-- Reboot --
Jan 22 10:36:19 bigigloo systemd[1]: Starting Systemd service file for Splunk, generated by 'splunk enable boot-start'...
Jan 22 10:36:19 bigigloo bash[3180]: chown: cannot access '/sys/fs/cgroup/cpu/system.slice/Splunkd.service': No such file or directory
Jan 22 10:36:22 bigigloo systemd[1]: Splunkd.service: Control process exited, code=exited, status=1/FAILURE
Jan 22 10:36:22 bigigloo systemd[1]: Splunkd.service: Killing process 3393 (sh) with signal SIGKILL.
Jan 22 10:36:22 bigigloo systemd[1]: Splunkd.service: Killing process 3408 (sh) with signal SIGKILL.
Jan 22 10:36:22 bigigloo systemd[1]: Splunkd.service: Failed with result 'exit-code'.
Jan 22 10:36:22 bigigloo systemd[1]: Failed to start Systemd service file for Splunk, generated by 'splunk enable boot-start'.
Jan 22 10:36:22 bigigloo bash[3475]: chown: cannot access '/sys/fs/cgroup/cpu/system.slice/Splunkd.service': No such file or directory
Jan 22 10:36:22 bigigloo systemd[1]: Splunkd.service: Scheduled restart job, restart counter is at 1.
Jan 22 10:36:23 bigigloo bash[3480]: chown: cannot access '/sys/fs/cgroup/cpu/system.slice/Splunkd.service': No such file or directory
Jan 22 10:36:22 bigigloo systemd[1]: Stopped Systemd service file for Splunk, generated by 'splunk enable boot-start'.
Jan 22 10:36:22 bigigloo systemd[1]: Starting Systemd service file for Splunk, generated by 'splunk enable boot-start'...
Jan 22 10:36:23 bigigloo bash[3496]: chown: cannot access '/sys/fs/cgroup/cpu/system.slice/Splunkd.service': No such file or directory
Jan 22 10:36:22 bigigloo systemd[1]: Splunkd.service: Control process exited, code=exited, status=1/FAILURE
Jan 22 10:36:22 bigigloo systemd[1]: Splunkd.service: Killing process 3476 (sh) with signal SIGKILL.
Jan 22 10:36:22 bigigloo systemd[1]: Splunkd.service: Killing process 3477 (btool) with signal SIGKILL.
Jan 22 10:36:22 bigigloo systemd[1]: Splunkd.service: Failed with result 'exit-code'.
Jan 22 10:36:22 bigigloo systemd[1]: Failed to start Systemd service file for Splunk, generated by 'splunk enable boot-start'.
Jan 22 10:36:22 bigigloo systemd[1]: Splunkd.service: Scheduled restart job, restart counter is at 2.
Jan 22 10:36:22 bigigloo systemd[1]: Stopped Systemd service file for Splunk, generated by 'splunk enable boot-start'.
Jan 22 10:36:22 bigigloo systemd[1]: Starting Systemd service file for Splunk, generated by 'splunk enable boot-start'...
Jan 22 10:36:22 bigigloo systemd[1]: Splunkd.service: Control process exited, code=exited, status=1/FAILURE
Jan 22 10:36:22 bigigloo systemd[1]: Splunkd.service: Killing process 3481 (sh) with signal SIGKILL.
Jan 22 10:36:22 bigigloo systemd[1]: Splunkd.service: Killing process 3482 (btool) with signal SIGKILL.
Jan 22 10:36:22 bigigloo systemd[1]: Splunkd.service: Failed with result 'exit-code'.
Jan 22 10:36:22 bigigloo systemd[1]: Failed to start Systemd service file for Splunk, generated by 'splunk enable boot-start'.
Jan 22 10:36:23 bigigloo systemd[1]: Splunkd.service: Scheduled restart job, restart counter is at 3.
Jan 22 10:36:23 bigigloo systemd[1]: Stopped Systemd service file for Splunk, generated by 'splunk enable boot-start'.
Jan 22 10:36:23 bigigloo systemd[1]: Starting Systemd service file for Splunk, generated by 'splunk enable boot-start'...
Jan 22 10:36:23 bigigloo systemd[1]: Splunkd.service: Control process exited, code=exited, status=1/FAILURE
Jan 22 10:36:23 bigigloo systemd[1]: Splunkd.service: Killing process 3497 (sh) with signal SIGKILL.
Jan 22 10:36:23 bigigloo systemd[1]: Splunkd.service: Killing process 3499 (btool) with signal SIGKILL.
Jan 22 10:36:23 bigigloo systemd[1]: Splunkd.service: Failed with result 'exit-code'.
Jan 22 10:36:23 bigigloo systemd[1]: Failed to start Systemd service file for Splunk, generated by 'splunk enable boot-start'.
Jan 22 10:36:23 bigigloo systemd[1]: Splunkd.service: Scheduled restart job, restart counter is at 4.
Jan 22 10:36:23 bigigloo systemd[1]: Stopped Systemd service file for Splunk, generated by 'splunk enable boot-start'.
Jan 22 10:36:23 bigigloo systemd[1]: Starting Systemd service file for Splunk, generated by 'splunk enable boot-start'...

 

 

 

I tracked the problem down to the two ExecStartPost commands in the unit file /etc/systemd/system/Splunkd.service. Commenting those two fixed the problem.

 

 

 

#This unit file replaces the traditional start-up script for systemd
#configurations, and is used when enabling boot-start for Splunk on
#systemd-based Linux distributions.

[Unit]
Description=Systemd service file for Splunk, generated by 'splunk enable boot-start'
After=network.target

[Service]
Type=simple
Restart=always
ExecStart=/opt/splunk/bin/splunk _internal_launch_under_systemd
KillMode=mixed
KillSignal=SIGINT
TimeoutStopSec=360
LimitNOFILE=65536
SuccessExitStatus=51 52
RestartPreventExitStatus=51
RestartForceExitStatus=52
User=root
Group=root
Delegate=true
CPUShares=1024
MemoryLimit=20868083712
PermissionsStartOnly=true
#ExecStartPost=/bin/bash -c "chown -R root:root /sys/fs/cgroup/cpu/system.slice/%n"
#ExecStartPost=/bin/bash -c "chown -R root:root /sys/fs/cgroup/memory/system.slice/%n"

[Install]
WantedBy=multi-user.target

 

 

 However, I presume updates of Splunk might restore the files to the old variant again. What do I need to do in order to make the start of Splunk cgroup2 compliant?

Labels (2)
Tags (1)
0 Karma
1 Solution

mohare
Observer

I just noticed the same issue myself, would also like an answer

0 Karma

Graham_W
Engager

It appears to be a path issue. The immediate cause of the failure is that one of the parent directories does not exist.  Exploring around the cgroup folder, I noticed that most services had created their own directories in /sys/fs/cgroup/system.slice, in that the separate cpu and memory directories do not exist, hence why the chown fails. So I just modified my splunk systemd service file from

ExecStartPost=/bin/bash -c "chown -R splunk:splunk /sys/fs/cgroup/cpu/system.slice/%n"
ExecStartPost=/bin/bash -c "chown -R splunk:splunk /sys/fs/cgroup/memory/system.slice/%

to

ExecStartPost=/bin/bash -c "chown -R splunk:splunk /sys/fs/cgroup/system.slice/%n"
ExecStartPost=/bin/bash -c "chown -R splunk:splunk /sys/fs/cgroup/system.slice/%

In my case, I was running splunk as splunk, not root, but the same occurred on a host where I was running the splunk processes as root as well.

I had the same issue on some Fedora machines running the universal forwarder as well.

0 Karma

isoutamo
SplunkTrust
SplunkTrust

daubsi_2
Explorer

Too bad 😞 Hope they will offer a solution in the near future though

0 Karma
Get Updates on the Splunk Community!

Splunk APM: New Product Features + Community Office Hours Recap!

Howdy Splunk Community! Over the past few months, we’ve had a lot going on in the world of Splunk Application ...

Index This | Forward, I’m heavy; backward, I’m not. What am I?

April 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

A Guide To Cloud Migration Success

As enterprises’ rapid expansion to the cloud continues, IT leaders are continuously looking for ways to focus ...