Installation

Splunk doesn't start on cgroup2 only system

daubsi_2
Explorer

I have installed Splunk on a cgroup1/2 hybrid system using "enable boot-start systemd-managed 1" to start it on bootup.

Yesterday I switched to a cgroup2 only system by disabling the usage of cgroup1 via grub/kernel boot parameters.

Now splunk doesn't start anymore due to a file in the cgroup1 file system hierarchy no longer been present:

 

 

Jan 22 10:25:54 bigigloo systemd[1]: Stopping Systemd service file for Splunk, generated by 'splunk enable boot-start'...
Jan 22 10:30:58 bigigloo systemd[1]: Splunkd.service: Killing process 2847689 (python3.7) with signal SIGKILL.
Jan 22 10:30:58 bigigloo systemd[1]: Splunkd.service: Succeeded.
Jan 22 10:30:58 bigigloo systemd[1]: Stopped Systemd service file for Splunk, generated by 'splunk enable boot-start'.
-- Reboot --
Jan 22 10:36:19 bigigloo systemd[1]: Starting Systemd service file for Splunk, generated by 'splunk enable boot-start'...
Jan 22 10:36:19 bigigloo bash[3180]: chown: cannot access '/sys/fs/cgroup/cpu/system.slice/Splunkd.service': No such file or directory
Jan 22 10:36:22 bigigloo systemd[1]: Splunkd.service: Control process exited, code=exited, status=1/FAILURE
Jan 22 10:36:22 bigigloo systemd[1]: Splunkd.service: Killing process 3393 (sh) with signal SIGKILL.
Jan 22 10:36:22 bigigloo systemd[1]: Splunkd.service: Killing process 3408 (sh) with signal SIGKILL.
Jan 22 10:36:22 bigigloo systemd[1]: Splunkd.service: Failed with result 'exit-code'.
Jan 22 10:36:22 bigigloo systemd[1]: Failed to start Systemd service file for Splunk, generated by 'splunk enable boot-start'.
Jan 22 10:36:22 bigigloo bash[3475]: chown: cannot access '/sys/fs/cgroup/cpu/system.slice/Splunkd.service': No such file or directory
Jan 22 10:36:22 bigigloo systemd[1]: Splunkd.service: Scheduled restart job, restart counter is at 1.
Jan 22 10:36:23 bigigloo bash[3480]: chown: cannot access '/sys/fs/cgroup/cpu/system.slice/Splunkd.service': No such file or directory
Jan 22 10:36:22 bigigloo systemd[1]: Stopped Systemd service file for Splunk, generated by 'splunk enable boot-start'.
Jan 22 10:36:22 bigigloo systemd[1]: Starting Systemd service file for Splunk, generated by 'splunk enable boot-start'...
Jan 22 10:36:23 bigigloo bash[3496]: chown: cannot access '/sys/fs/cgroup/cpu/system.slice/Splunkd.service': No such file or directory
Jan 22 10:36:22 bigigloo systemd[1]: Splunkd.service: Control process exited, code=exited, status=1/FAILURE
Jan 22 10:36:22 bigigloo systemd[1]: Splunkd.service: Killing process 3476 (sh) with signal SIGKILL.
Jan 22 10:36:22 bigigloo systemd[1]: Splunkd.service: Killing process 3477 (btool) with signal SIGKILL.
Jan 22 10:36:22 bigigloo systemd[1]: Splunkd.service: Failed with result 'exit-code'.
Jan 22 10:36:22 bigigloo systemd[1]: Failed to start Systemd service file for Splunk, generated by 'splunk enable boot-start'.
Jan 22 10:36:22 bigigloo systemd[1]: Splunkd.service: Scheduled restart job, restart counter is at 2.
Jan 22 10:36:22 bigigloo systemd[1]: Stopped Systemd service file for Splunk, generated by 'splunk enable boot-start'.
Jan 22 10:36:22 bigigloo systemd[1]: Starting Systemd service file for Splunk, generated by 'splunk enable boot-start'...
Jan 22 10:36:22 bigigloo systemd[1]: Splunkd.service: Control process exited, code=exited, status=1/FAILURE
Jan 22 10:36:22 bigigloo systemd[1]: Splunkd.service: Killing process 3481 (sh) with signal SIGKILL.
Jan 22 10:36:22 bigigloo systemd[1]: Splunkd.service: Killing process 3482 (btool) with signal SIGKILL.
Jan 22 10:36:22 bigigloo systemd[1]: Splunkd.service: Failed with result 'exit-code'.
Jan 22 10:36:22 bigigloo systemd[1]: Failed to start Systemd service file for Splunk, generated by 'splunk enable boot-start'.
Jan 22 10:36:23 bigigloo systemd[1]: Splunkd.service: Scheduled restart job, restart counter is at 3.
Jan 22 10:36:23 bigigloo systemd[1]: Stopped Systemd service file for Splunk, generated by 'splunk enable boot-start'.
Jan 22 10:36:23 bigigloo systemd[1]: Starting Systemd service file for Splunk, generated by 'splunk enable boot-start'...
Jan 22 10:36:23 bigigloo systemd[1]: Splunkd.service: Control process exited, code=exited, status=1/FAILURE
Jan 22 10:36:23 bigigloo systemd[1]: Splunkd.service: Killing process 3497 (sh) with signal SIGKILL.
Jan 22 10:36:23 bigigloo systemd[1]: Splunkd.service: Killing process 3499 (btool) with signal SIGKILL.
Jan 22 10:36:23 bigigloo systemd[1]: Splunkd.service: Failed with result 'exit-code'.
Jan 22 10:36:23 bigigloo systemd[1]: Failed to start Systemd service file for Splunk, generated by 'splunk enable boot-start'.
Jan 22 10:36:23 bigigloo systemd[1]: Splunkd.service: Scheduled restart job, restart counter is at 4.
Jan 22 10:36:23 bigigloo systemd[1]: Stopped Systemd service file for Splunk, generated by 'splunk enable boot-start'.
Jan 22 10:36:23 bigigloo systemd[1]: Starting Systemd service file for Splunk, generated by 'splunk enable boot-start'...

 

 

 

I tracked the problem down to the two ExecStartPost commands in the unit file /etc/systemd/system/Splunkd.service. Commenting those two fixed the problem.

 

 

 

#This unit file replaces the traditional start-up script for systemd
#configurations, and is used when enabling boot-start for Splunk on
#systemd-based Linux distributions.

[Unit]
Description=Systemd service file for Splunk, generated by 'splunk enable boot-start'
After=network.target

[Service]
Type=simple
Restart=always
ExecStart=/opt/splunk/bin/splunk _internal_launch_under_systemd
KillMode=mixed
KillSignal=SIGINT
TimeoutStopSec=360
LimitNOFILE=65536
SuccessExitStatus=51 52
RestartPreventExitStatus=51
RestartForceExitStatus=52
User=root
Group=root
Delegate=true
CPUShares=1024
MemoryLimit=20868083712
PermissionsStartOnly=true
#ExecStartPost=/bin/bash -c "chown -R root:root /sys/fs/cgroup/cpu/system.slice/%n"
#ExecStartPost=/bin/bash -c "chown -R root:root /sys/fs/cgroup/memory/system.slice/%n"

[Install]
WantedBy=multi-user.target

 

 

 However, I presume updates of Splunk might restore the files to the old variant again. What do I need to do in order to make the start of Splunk cgroup2 compliant?

Labels (2)
Tags (1)
0 Karma
1 Solution

mohare
Observer

I just noticed the same issue myself, would also like an answer

0 Karma

Graham_W
Engager

It appears to be a path issue. The immediate cause of the failure is that one of the parent directories does not exist.  Exploring around the cgroup folder, I noticed that most services had created their own directories in /sys/fs/cgroup/system.slice, in that the separate cpu and memory directories do not exist, hence why the chown fails. So I just modified my splunk systemd service file from

ExecStartPost=/bin/bash -c "chown -R splunk:splunk /sys/fs/cgroup/cpu/system.slice/%n"
ExecStartPost=/bin/bash -c "chown -R splunk:splunk /sys/fs/cgroup/memory/system.slice/%

to

ExecStartPost=/bin/bash -c "chown -R splunk:splunk /sys/fs/cgroup/system.slice/%n"
ExecStartPost=/bin/bash -c "chown -R splunk:splunk /sys/fs/cgroup/system.slice/%

In my case, I was running splunk as splunk, not root, but the same occurred on a host where I was running the splunk processes as root as well.

I had the same issue on some Fedora machines running the universal forwarder as well.

0 Karma

isoutamo
SplunkTrust
SplunkTrust

daubsi_2
Explorer

Too bad 😞 Hope they will offer a solution in the near future though

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...