Monitoring Splunk

splunkd PID going in "defuct" after restart.

splunkuseradmin
Path Finder

Hi all,

Need in some issue which is might be a known one, please help.
every frequently one or the other indexer(splunkd) service is going in "defunct" state then when i try to restart the service all PID's related to splunk going in state so cannot restart the service aswell.

Initially, it was showing below "defunct" then i tried to restart splunk.service
[root@hostname ~]# ps -ef | grep defunct
root 23399 23167 0 21:49 pts/1 00:00:00 grep --color=auto defunct
svc.spl+ 32079 4383 0 09:07 ? 00:00:03 [splunkd]

[root@hostname ~]#

I tried to restart and the status is below.

[root@hostname ~]# systemctl status splunk.service
● splunk.service - Splunk
Loaded: loaded (/usr/lib/systemd/system/splunk.service; enabled; vendor preset: disabled)
Active: activating (start) since Sun 2020-02-02 22:45:54 MST; 1min 9s ago
Process: 29434 ExecStop=/opt/splunk/bin/splunk stop (code=killed, signal=TERM)
Main PID: 4381; : 32121 (splunk)
CGroup: /system.slice/splunk.service
└─32121 /opt/splunk/bin/splunk start --answer-yes --no-prompt --accept-license
‣ 4381 [splunkd]

Feb 02 22:45:54 hostnme.domain.com systemd[1]: Starting Splunk...
Feb 02 22:45:54 hostnme.domain.com splunk[32121]: splunkd 4381 was not running.
Feb 02 22:45:54 hostnme.domain.com splunk[32121]: Stopping splunk helpers...

[root@hostname ~]# ps -ef | grep splunk

svc.spl+ 4381 1 99 Jan30 ? 5-12:21:04 [splunkd]

svc.spl+ 7638 1 0 09:05 ? 00:00:37 [splunkd]

svc.spl+ 7639 7638 0 09:05 ? 00:00:00 [splunkd]

svc.spl+ 26167 1 0 09:06 ? 00:00:14 [splunkd]

svc.spl+ 26168 26167 0 09:06 ? 00:00:00 [splunkd]

svc.spl+ 32079 1 0 09:07 ? 00:00:03 [splunkd]

svc.spl+ 32080 32079 0 09:07 ? 00:00:00 [splunkd]

svc.spl+ 32121 1 0 22:45 ? 00:00:00 /opt/splunk/bin/splunk start --answer-yes --no-prompt --accept-license

root 32281 23128 0 22:47 pts/1 00:00:00 grep --color=auto splunk

svc.spl+ 36590 1 0 09:08 ? 00:00:03 [splunkd]

svc.spl+ 36597 36590 0 09:08 ? 00:00:00 [splunkd]

svc.spl+ 36744 1 0 09:08 ? 00:00:08 [splunkd]

svc.spl+ 36746 36744 0 09:08 ? 00:00:00 [splunkd]

svc.spl+ 45222 1 0 09:10 ? 00:00:01 [splunkd]

svc.spl+ 45224 45222 0 09:10 ? 00:00:00 [splunkd]

any help would be appreciated.

Thankyou.

Labels (1)
0 Karma

codebuilder
Influencer

This happens when you initially install/start Splunk as root, but then change the owner to "splunk" e.g.
Stop Splunk gracefully, systemctl splunk stop.
Check for any remaining processes (ps -ef |grep -i splunk), and kill any that remain (kill -9 e.g.).
Ensure that Splunk is set to run as the user you intend, check /opt/splunk/etc/splunk-launch.conf.
Restart Splunk and verify. Note: if using systemd, update also the user in the unit file (/etc/systemd/system/splunkd.service, or whatever you had named it).

----
An upvote would be appreciated and Accept Solution if it helps!
0 Karma

harsmarvania57
Ultra Champion

Have you looked at dmesg or /var/log/messages ? It looks like splunk process killed but why it is killed that you need to check in OS logs (For example: OOM killer)

0 Karma

splunkuseradmin
Path Finder

i see some hung_task_kernel messages which "Splunkd : is blocked for more than 120 secs".

0 Karma
Get Updates on the Splunk Community!

Combine Multiline Logs into a Single Event with SOCK - a Guide for Advanced Users

This article is the continuation of the “Combine multiline logs into a single event with SOCK - a step-by-step ...

Everything Community at .conf24!

You may have seen mention of the .conf Community Zone 'round these parts and found yourself wondering what ...

Index This | I’m short for "configuration file.” What am I?

May 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with a Special ...