I've encountered the following with a crashed splunk forwarder running on 4.3.3 Linux 64-bit.
Splunk says it’s running….
$ ./splunk status splunkd is running (PID: 14371). splunk helpers are running (PIDs: 14372).
But a ps –ef shows different
No, splunk is not running…
$ ps -ef|grep splunk|grep -v grep splunk 6569 6560 0 18:04 pts/2 00:00:00 -ksh splunk 26616 6569 0 18:19 pts/2 00:00:00 ps –ef
Seems different processes have taken over these pids…
$ ps -ef|egrep "14371|14372" root 14371 2387 0 Jul23 ? 00:00:00 [rpciod/27] root 14372 2387 0 Jul23 ? 00:00:00 [rpciod/28]
Splunk is trying to stop the wrong pid!…
$ ./splunk stop Stopping splunkd... Shutting down. Please wait, as this may take a few minutes. Could not kill pid 14371. [FAILED]
The start command also thinks splunk is running but it is not…
$ ./splunk start The splunk daemon (splunkd) is already running. [FAILED]
I remove the stale pid file…
Splunk now starts…
$ ./splunk start Splunk> Like an F-18, bro. Checking prerequisites... Checking mgmt port : open Checking conf files for typos... All preliminary checks passed. Starting splunk server daemon (splunkd)... [ OK ]
I've found the following bug metioned in the 4.3.4 documentation.
Forwarder startup script should handle stale PID files gracefully after server crashes. (SPL-36597)
Is this an issue in 4.3.4 as well or corrected in 4.3.4?
Also, I would like to have the splunk status check that the splunk proceses are running, not just that a pid is running.