Solved: Why is Systemd broken on new install?

48tfhd86gv · ‎09-16-2019

Hi,

I downloaded Splunk version 7.3.0 (build 657388c7a488) and installed it via the deb file onto a clean install of Debian 10.1

I subsequently followed the "Configure systemd on a clean install" instructions (https://docs.splunk.com/Documentation/Splunk/7.3.1/Admin/RunSplunkassystemdservice)

However running

sudo $SPLUNK_HOME/bin/splunk start

Yields (and same result if I "su -" to root instead of sudo)

Splunk> Needle. Haystack. Found.

Checking prerequisites...
        Checking http port [8000]: open
        Checking mgmt port [8089]: open
        Checking appserver port [127.0.0.1:8065]: open
        Checking kvstore port [8191]: open
        Checking configuration... Done.
        Checking critical directories...        Done
        Checking indexes...
                Validated: _audit _internal _introspection _telemetry _thefishbucket history main summary
        Done
        Checking filesystem compatibility...  Done
        Checking conf files for problems...
        Done
        Checking default conf files for edits...
        Validating installed files against hashes from '/opt/splunk/splunk-7.3.0-657388c7a488-linux-2.6-x86_64-manifest'
        All installed files intact.
        Done
All preliminary checks passed.

Starting splunk server daemon (splunkd)...  
Job for Splunkd.service failed because the control process exited with error code.
See "systemctl status Splunkd.service" and "journalctl -xe" for details.
Systemd manages the Splunk service. Use 'systemctl start Splunkd' to start the service. Root permission is required. Login as root user or use sudo.



# systemctl status Splunkd.service
● Splunkd.service - Systemd service file for Splunk, generated by 'splunk enable boot-start'
   Loaded: loaded (/etc/systemd/system/Splunkd.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Mon 2019-09-16 19:05:05 BST; 1min 4s ago
  Process: 1655 ExecStart=/opt/splunk/bin/splunk _internal_launch_under_systemd (code=killed, signal=TERM)
  Process: 1656 ExecStartPost=/bin/bash -c chown -R 1001:1001 /sys/fs/cgroup/cpu/init.scope/system.slice/Splunkd.service (code=exited, status=1/FAILURE)
 Main PID: 1655 (code=killed, signal=TERM)

Sep 16 19:05:05 spl systemd[1]: Splunkd.service: Service RestartSec=100ms expired, scheduling restart.
Sep 16 19:05:05 spl systemd[1]: Splunkd.service: Scheduled restart job, restart counter is at 5.
Sep 16 19:05:05 spl systemd[1]: Stopped Systemd service file for Splunk, generated by 'splunk enable boot-start'.
Sep 16 19:05:05 spl systemd[1]: Splunkd.service: Start request repeated too quickly.
Sep 16 19:05:05 spl systemd[1]: Splunkd.service: Failed with result 'exit-code'.
Sep 16 19:05:05 spl systemd[1]: Failed to start Systemd service file for Splunk, generated by 'splunk enable boot-start'.




--
-- The process' exit code is 'killed' and its exit status is 15.
Sep 16 19:05:05 spl systemd[1]: Splunkd.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- The unit Splunkd.service has entered the 'failed' state with result 'exit-code'.                                                                                 
Sep 16 19:05:05 spl systemd[1]: Failed to start Systemd service file for Splunk, generated by 'splunk enable boot-start'.                                           
-- Subject: A start job for unit Splunkd.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit Splunkd.service has finished with a failure.
--
-- The job identifier is 2899 and the job result is failed.
Sep 16 19:05:05 spl systemd[1]: Splunkd.service: Service RestartSec=100ms expired, scheduling restart.                                                              
Sep 16 19:05:05 spl systemd[1]: Splunkd.service: Scheduled restart job, restart counter is at 5.                                                                    
-- Subject: Automatic restarting of a unit has been scheduled
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- Automatic restarting of the unit Splunkd.service has been scheduled, as the result for                                                                           
-- the configured Restart= setting for the unit.
Sep 16 19:05:05 spl systemd[1]: Stopped Systemd service file for Splunk, generated by 'splunk enable boot-start'.                                                   
-- Subject: A stop job for unit Splunkd.service has finished
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A stop job for unit Splunkd.service has finished.
--
-- The job identifier is 2975 and the job result is done.
Sep 16 19:05:05 spl systemd[1]: Splunkd.service: Start request repeated too quickly.                                                                                
Sep 16 19:05:05 spl systemd[1]: Splunkd.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- The unit Splunkd.service has entered the 'failed' state with result 'exit-code'.                                                                                 
Sep 16 19:05:05 spl systemd[1]: Failed to start Systemd service file for Splunk, generated by 'splunk enable boot-start'.                                           
-- Subject: A start job for unit Splunkd.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit Splunkd.service has finished with a failure.
--
-- The job identifier is 2975 and the job result is failed.

48tfhd86gv · ‎09-17-2019

FYI, found my own answer...

It seems the Splunk systemd installer script is a bit dumb (for lack of a better word).

Apparently Splunk developers don't see it fit to figure out the correct cgroup location for a given system.

So instead of "splunk enable boot-start -systemd-managed" checking the Splunk developer's choice of location and then raising an exception (or give you the choice to input manually) if it can't find it, instead the script just installs anyway and then leaves it to you to figure out why.

I can't say I'm impressed by either :
(a) The behaviour of "splunk enable boot-start -systemd-managed"
(b) The poor error handling by Splunk if cgroup is incorrect (I mean seriously, all it had to do throw an error saying cgroup not found !)

I apologise for the tone of the message, but frankly this problem took up far too many hours of my time yesterday.

View solution in original post

Skeer-Jamf · ‎07-17-2023

This post led me to the /etc/systemd/system/SplunkForwarder.service file. Except now, on version 9.1.0.1 , this line:

ExecStartPost=/bin/bash -c "chown -R <userid>:<groupid> /sys/fs/cgroup/system.slice/%n"

already exists. Since all objects under /sys/fs/cgroup is owned by root:root, I added the user `splunkfwd` to the `adm` group then rebooted. Then I can use systemd to start/stop splunkforwarder.

garias_splunk · ‎12-21-2020

This can be SELinux blocking this service, check this answer

https://community.splunk.com/t5/Archive/systemctl-start-SplunkForwarder-fails-error-203/m-p/533716/h...

bandit · ‎12-31-2019

Summary of the issue:
Splunk 6.0.0 - Splunk 7.2.1 defaults to using init.d when enabling boot start
Splunk 7.2.2 - Splunk 7.2.9 defaults to using systemd when enabling boot start
Splunk 7.3.0 - Splunk 8.x defaults to using init.d when enabling boot start

systemd defaults to prompting for root credentials upon stop/start/restart of Splunk

Here is a simple fix if you have encountered this issue and prefer to use the traditional init.d scripts vs systemd.

Splunk Enterprise/Heavy Forwarder example (note: replace the splunk user below with the account you run splunk as):

sudo /opt/splunk/bin/splunk disable boot-start
sudo /opt/splunk/bin/splunk enable boot-start -user splunk -systemd-managed 0

Splunk Universal Forwarder example (note: replace the splunk user below with the account you run splunk as):

sudo /opt/splunkforwarder/bin/splunk disable boot-start
sudo /opt/splunkforwarder/bin/splunk enable boot-start -user splunk -systemd-managed 0

blaha1 · ‎02-14-2024

I guess Splunk 9.x defauls to systemd again. Any way to revert to init.d?

48tfhd86gv · ‎09-17-2019

FYI, found my own answer...

It seems the Splunk systemd installer script is a bit dumb (for lack of a better word).

Apparently Splunk developers don't see it fit to figure out the correct cgroup location for a given system.

So instead of "splunk enable boot-start -systemd-managed" checking the Splunk developer's choice of location and then raising an exception (or give you the choice to input manually) if it can't find it, instead the script just installs anyway and then leaves it to you to figure out why.

I can't say I'm impressed by either :
(a) The behaviour of "splunk enable boot-start -systemd-managed"
(b) The poor error handling by Splunk if cgroup is incorrect (I mean seriously, all it had to do throw an error saying cgroup not found !)

I apologise for the tone of the message, but frankly this problem took up far too many hours of my time yesterday.

richgalloway · ‎09-17-2019

What was the solution? What should others do to avoid or fix this problem?

---
If this reply helps you, Karma would be appreciated.

48tfhd86gv · ‎09-17-2019

@kundeng @richgalloway

The workaround:

Scroll almost all the way to the bottom of https://docs.splunk.com/Documentation/Splunk/7.3.1/Admin/RunSplunkassystemdservice. Find the little blue comment box that talks about cgroups and think about what it means for your installation (i.e. what Splunk decided the cgroup should be is probably not the location on your system .... so go edit the systemd unit file that Splunk installed).

The solution:

As I made clear the real solution is....
Splunk need to write better software that (a) doesn't make hardcoded assumptoins about locations of files on systems (b) has better error handling that provides useful messages instead of failing without reason

kundeng · ‎09-17-2019

What was the solution?

bkresoja_2 · ‎01-26-2020

For me it was removing /init.scope from splunkd.service

ExecStartPost=/bin/bash -c "chown -R : /sys/fs/cgroup/cpu/init.scope/system.slice/%n"
ExecStartPost=/bin/bash -c "chown -R : /sys/fs/cgroup/memory/init.scope/system.slice/%n"

emallinger · ‎03-23-2022

Hi,

Do you know the impact of that ?

It worked for me as well, but I'd like to better understand what I'm really doing.

Thanks!

Ema

richgalloway · ‎09-17-2019

One of the error messages says Use 'systemctl start Splunkd' to start the service, but I don't see where you tried that.

---
If this reply helps you, Karma would be appreciated.

48tfhd86gv · ‎09-17-2019

@richgalloway

Yup, I tried that too.

Ultimately I found the solution, buried in an obscure comment deep in the manual.

It seems the Splunk systemd installer script is a bit dumb (for lack of a better word).

Apparently Splunk developers don't see it fit to figure out the correct cgroup location for a given system.

So instead of "splunk enable boot-start -systemd-managed" checking the Splunk developer's choice of location and then raising an exception (or give you the choice to input manually) if it can't find it, instead the script just installs anyway and then leaves it to you to figure out why.

I can't say I'm impressed by either :
(a) The behaviour of "splunk enable boot-start -systemd-managed"
(b) The poor error handling by Splunk if cgroup is incorrect (I mean seriously, all it had to do throw an error saying cgroup not found !)

I apologise for the tone of the message, but frankly this problem took up far too many hours of my time yesterday.

AndyAtSplunk · ‎05-07-2022

Here is what fixed it for me -

Original /etc/systemd/system/Splunkd.service (working under Ubuntu 20.04 LTS):

ExecStartPost=/bin/bash -c "chown -R <userid>:<groupid> /sys/fs/cgroup/cpu/system.slice/%n"
ExecStartPost=/bin/bash -c "chown -R <userid>:<groupid> /sys/fs/cgroup/memory/system.slice/%n"

On Ubuntu 22.04 LTS, the cgroup does not include cpu or memory in the path, so I modified /etc/systemd/system/Splunkd.service is as follows:

ExecStartPost=/bin/bash -c "chown -R <userid>:<groupid> /sys/fs/cgroup/system.slice/%n"

I ran as root and used: sudo find /sys/fs/cgroup -name "*Splunk*" -print

richgalloway · ‎09-17-2019

Please post your solution as an answer and accept it to help future readers.

---
If this reply helps you, Karma would be appreciated.

48tfhd86gv · ‎09-17-2019

Will do. Thanks for dropping by !

Why is Systemd broken on new install?

other

New Case Study Shows the Value of Partnering with Splunk Academic Alliance

How to Monitor Google Kubernetes Engine (GKE)

Index This | How can you make 45 using only 4?