I've been fighting all day trying to figure out what keeps causing above error when starting Splunk, and here is some background:
OS: CentOS Stream 9
Kernel: Linux 5.14.0-295.el9.x86_64
Splunk: splunk-9.0.4.1-419ad9369127-Linux-x86_64.tgz
Earlier (today) a ran version: 8.2.9, but as it kept failing I thought it could be the Splunk version and some systemctl stuff issues (as I've read quite a bit about), but after upgrading it's still the same.
The service has been initiated as:
sudo /opt/splunk/bin/splunk enable boot-start -systemd-managed 1 -user splunk
The sudo systemctl status Splunkd shows:
× Splunkd.service - Systemd service file for Splunk, generated by 'splunk enable boot-start'
Loaded: loaded (/etc/systemd/system/Splunkd.service; enabled; preset: disabled)
Active: failed (Result: exit-code) since Wed 2023-08-02 19:53:42 CEST; 1h 37min ago
Duration: 983us
Process: 402262 ExecStart=/opt/splunk/bin/splunk _internal_launch_under_systemd (code=exited, status=8)
Process: 402263 ExecStartPost=/bin/bash -c chown -R splunk:splunk /sys/fs/cgroup/system.slice/Splunkd.service (code=exited, status=0/SUCCESS)
Main PID: 402262 (code=exited, status=8)
CPU: 7ms
aug 02 19:53:42 localhost.localdomain systemd[1]: Stopped Systemd service file for Splunk, generated by 'splunk enable boot-start'.
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Converting job Splunkd.service/restart -> Splunkd.service/start
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Consumed 7ms CPU time.
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Start request repeated too quickly.
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Failed with result 'exit-code'.
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Service restart not allowed.
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Changed dead -> failed
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Job 99417 Splunkd.service/start finished, result=failed
aug 02 19:53:42 localhost.localdomain systemd[1]: Failed to start Systemd service file for Splunk, generated by 'splunk enable boot-start'.
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Unit entered failed state.
If I run the ExecStart=/opt/splunk/bin/splunk _internal_launch_under_systemd directly from the command line splunk starts without any problems - I don't get it.
I've edited: /etc/systemd/system.conf and added:
LogLevel=debug
And running: journalctl -xeu Splunkd.service writes:
░░
░░ The unit Splunkd.service completed and consumed the indicated resources.
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Will spawn child (service_enter_start): /opt/splunk/bin/splunk
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: cgroup-compat: Applying [Startup]CPUShares=1024 as [Startup]CPUWeight=100 on /system.slice/Splunkd.service
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Failed to set 'io.weight' attribute on '/system.slice/Splunkd.service' to 'default 100': No such file or directory
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: cgroup-compat: Applying MemoryLimit=7922106368 as MemoryMax=
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Passing 0 fds to service
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: About to execute /opt/splunk/bin/splunk _internal_launch_under_systemd
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Forked /opt/splunk/bin/splunk as 402262
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Will spawn child (service_enter_start_post): /bin/bash
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: About to execute /bin/bash -c "chown -R splunk:splunk /sys/fs/cgroup/system.slice/Splunkd.service"
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Forked /bin/bash as 402263
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Changed dead -> start-post
aug 02 19:53:42 localhost.localdomain systemd[1]: Starting Systemd service file for Splunk, generated by 'splunk enable boot-start'...
░░ Subject: A start job for unit Splunkd.service has begun execution
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░
░░ A start job for unit Splunkd.service has begun execution.
░░
░░ The job identifier is 99280.
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: User lookup succeeded: uid=1002 gid=1002
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: User lookup succeeded: uid=1002 gid=1002
aug 02 19:53:42 localhost.localdomain systemd[402263]: Splunkd.service: Executing: /bin/bash -c "chown -R splunk:splunk /sys/fs/cgroup/system.slice/Splunkd.service"
aug 02 19:53:42 localhost.localdomain systemd[402262]: Splunkd.service: Executing: /opt/splunk/bin/splunk _internal_launch_under_systemd
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Child 402263 belongs to Splunkd.service.
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Control process exited, code=exited, status=0/SUCCESS (success)
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░
░░ An ExecStartPost= process belonging to unit Splunkd.service has exited.
░░
░░ The process' exit code is 'exited' and its exit status is 0.
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Got final SIGCHLD for state start-post.
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Changed start-post -> running
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Job 99280 Splunkd.service/start finished, result=done
aug 02 19:53:42 localhost.localdomain systemd[1]: Started Systemd service file for Splunk, generated by 'splunk enable boot-start'.
░░ Subject: A start job for unit Splunkd.service has finished successfully
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░
░░ A start job for unit Splunkd.service has finished successfully.
░░
░░ The job identifier is 99280.
aug 02 19:53:42 localhost.localdomain splunk[402262]: Couldn't open "/opt/splunk/bin/../etc/splunk-launch.conf": Permission denied
aug 02 19:53:42 localhost.localdomain splunk[402262]: Couldn't open "/opt/splunk/bin/../etc/splunk-launch.conf": Permission denied
aug 02 19:53:42 localhost.localdomain splunk[402262]: ERROR: Couldn't determine $SPLUNK_HOME or $SPLUNK_ETC; perhaps one should be set in environment
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Child 402262 belongs to Splunkd.service.
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Main process exited, code=exited, status=8/n/a
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░
░░ An ExecStart= process belonging to unit Splunkd.service has exited.
░░
░░ The process' exit code is 'exited' and its exit status is 8.
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░
░░ The unit Splunkd.service has entered the 'failed' state with result 'exit-code'.
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Service will restart (restart setting)
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Changed running -> failed
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Unit entered failed state.
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Consumed 7ms CPU time.
░░ Subject: Resources consumed by unit runtime
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░
░░ The unit Splunkd.service completed and consumed the indicated resources.
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Changed failed -> auto-restart
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Service RestartSec=100ms expired, scheduling restart.
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Trying to enqueue job Splunkd.service/restart/replace
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Installed new job Splunkd.service/restart as 99417
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Enqueued job Splunkd.service/restart as 99417
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Scheduled restart job, restart counter is at 5.
░░ Subject: Automatic restarting of a unit has been scheduled
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░
░░ Automatic restarting of the unit Splunkd.service has been scheduled, as the result for
░░ the configured Restart= setting for the unit.
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Changed auto-restart -> dead
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Job 99417 Splunkd.service/restart finished, result=done
aug 02 19:53:42 localhost.localdomain systemd[1]: Stopped Systemd service file for Splunk, generated by 'splunk enable boot-start'.
░░ Subject: A stop job for unit Splunkd.service has finished
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░
░░ A stop job for unit Splunkd.service has finished.
░░
░░ The job identifier is 99417 and the job result is done.
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Converting job Splunkd.service/restart -> Splunkd.service/start
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Consumed 7ms CPU time.
░░ Subject: Resources consumed by unit runtime
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░
░░ The unit Splunkd.service completed and consumed the indicated resources.
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Start request repeated too quickly.
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░
░░ The unit Splunkd.service has entered the 'failed' state with result 'exit-code'.
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Service restart not allowed.
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Changed dead -> failed
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Job 99417 Splunkd.service/start finished, result=failed
aug 02 19:53:42 localhost.localdomain systemd[1]: Failed to start Systemd service file for Splunk, generated by 'splunk enable boot-start'.
░░ Subject: A start job for unit Splunkd.service has failed
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░
░░ A start job for unit Splunkd.service has finished with a failure.
░░
░░ The job identifier is 99417 and the job result is failed.
aug 02 19:53:42 localhost.localdomain systemd[1]: Splunkd.service: Unit entered failed state.
lines 2855-2974/2974 (END)
In relation to above error:
aug 02 22:14:58 localhost.localdomain systemd[404267]: Splunkd.service: Executing: /opt/splunk/bin/splunk _internal_launch_under_systemd
aug 02 22:14:58 localhost.localdomain splunk[404267]: Couldn't open "/opt/splunk/bin/../etc/splunk-launch.conf": Permission denied
aug 02 22:14:58 localhost.localdomain splunk[404267]: Couldn't open "/opt/splunk/bin/../etc/splunk-launch.conf": Permission denied
aug 02 22:14:58 localhost.localdomain splunk[404267]: ERROR: Couldn't determine $SPLUNK_HOME or $SPLUNK_ETC; perhaps one should be set in environment
aug 02 22:14:58 localhost.localdomain systemd[1]: Splunkd.service: Child 404267 belongs to Splunkd.service.
aug 02 22:14:58 localhost.localdomain systemd[1]: Splunkd.service: Main process exited, code=exited, status=8/n/a
The text:
"/opt/splunk/bin/../etc/splunk-launch.conf": Permission denied
does not make any sense it's set as:
-rwxrwxrwx. 1 splunk splunk 765 2 aug 19:31 splunk-launch.conf
Any help would be highly appreciated.
Yes. I'm talking about the splunk installation method.
If you're simply untar the archive you might not get proper selinux labels (and might not have proper selinux policies). Try disabling selinux and see if splunk starts.
How did you install the software? For rpm-based distros you have ready-made packages.
I always download the tgz files, and unzip them directly in the the /opt/splunk folder, if it's splunk we're talking about?
Yes. I'm talking about the splunk installation method.
If you're simply untar the archive you might not get proper selinux labels (and might not have proper selinux policies). Try disabling selinux and see if splunk starts.
Let me try to have a look at that.
But how come with ExecStart cmd and not systemctl?
Ok, by disabling selinux it will actually start, so far so good.
Now, what's needed to fix this, so it will start when enabled also?
In other words, do you know how to analyze what's needed?
Well, starting with SELinux disabled just confirms that improper SELinux configuration was the problem but running it with SELinux disabled is more like a temporary walkaround than a real solution.
Lately most of my installations are mostly in environments where SELinux is disabled by default or not supported at all so I'm not sure whether installation from RPM will actually provide correct SELinux labels for files and so on. And working or not with SELinux in enforcing mode will depend on your configuration.
You can of course, as with any SELinux troubleshooting, try running your Splunk in permissive mode and then collect output of auditd with autit2why or audit2allow and adjust your SELinux policies accordingly.
It's a bit embracing, as I've worked with linux for years, but I never touched SELinux before now, and I most admit, this is a complete new world, and a bit complex I must say.
I figured out how to pipe audit.log in to audit2allow and get what need to be adjusted, but I didn't managed to figure out exactly how and where to store these SELinux policies.
(if you know more, I'd be happy to learn some more 🙂
Anyway - THANKS - you saved my long long day with this!!!
Most appreciated.
Cheers,
Bjarne
If you're mostly installing distro-provided packages, your whole interaction with SELinux might be limited to tweaking some booleans with setsebool here and there and that's it. It's getting more tricky if you're installing something from out of the distro or even configuring something distro-provided in a relatively unusual way.
So.
The easiest way to _see_ what is wrong and what needs to be done is to do
audit2allow < /var/log/audit/audit.log
(of course with root privs so you might do it as
sudo bash -c "audit2allow < /var/log/audit/audit.log"
if you're using sudo, not working directly as root)
But that will show you all SELinux "offences", not limited to just one process. Of course often it will be enough to see what's going on and if it needs just relabeling some files it might suffice.
If you want to limit to a single program (splunkd in our case) you can do
ausearch -c 'splunkd' ---raw | audit2allow
And finally, if you want to automatically create a SELinux module which will allow all those actions which it would normally block for this process (remember, we're running in Permissive mode), you can do
ausearch -c 'splunkd' --raw | audit2allow -M my_splunkd_module
It will create and compile a module called my_splunkd_module.
If I remember correctly, it will even tell you how to load it but if it doesn't - you load it with
semodule -i my_splunkd_module.pp
Of course you must remember that this module will contain rules which were triggered before you created the module. Your splunkd might still decide to try to do something not covered by the module in the future though so tweaking SELinux policies is a process.
Also - this is just a basic "brute-force" approach to allowing a program under SELinux. Architecting proper SELinux policy - including separate file types, port labels and so on - is more complicated. RH has a relatively decent docs on SELinux.
Hi @PickleRick,
Thanks - you're the man of the day😉👍
Even more time saved, thanks a lot for your time and sharing.
Cheers,
Bjarne