I've encountered the following with a crashed splunk forwarder running on 4.3.3 Linux 64-bit.
Splunk says it’s running….
$ ./splunk status
splunkd is running (PID: 14371).
splunk helpers are running (PIDs: 14372).
But a ps –ef shows different
No, splunk is not running…
$ ps -ef|grep splunk|grep -v grep
splunk 6569 6560 0 18:04 pts/2 00:00:00 -ksh
splunk 26616 6569 0 18:19 pts/2 00:00:00 ps –ef
Seems different processes have taken over these pids…
$ ps -ef|egrep "14371|14372"
root 14371 2387 0 Jul23 ? 00:00:00 [rpciod/27]
root 14372 2387 0 Jul23 ? 00:00:00 [rpciod/28]
Splunk is trying to stop the wrong pid!…
$ ./splunk stop
Stopping splunkd...
Shutting down. Please wait, as this may take a few minutes.
Could not kill pid 14371. [FAILED]
The start command also thinks splunk is running but it is not…
$ ./splunk start
The splunk daemon (splunkd) is already running. [FAILED]
I remove the stale pid file…
rm splunkforwarder/var/run/splunk/splunkd.pid
Splunk now starts…
$ ./splunk start
Splunk> Like an F-18, bro.
Checking prerequisites...
Checking mgmt port [8089]: open
Checking conf files for typos...
All preliminary checks passed.
Starting splunk server daemon (splunkd)...
[ OK ]
I've found the following bug metioned in the 4.3.4 documentation.
http://docs.splunk.com/Documentation/Splunk/4.3.4/ReleaseNotes/KnownIssues
Forwarder startup script should handle stale PID files gracefully after server crashes. (SPL-36597)
Is this an issue in 4.3.4 as well or corrected in 4.3.4?
Also, I would like to have the splunk status check that the splunk proceses are running, not just that a pid is running.
Thanks,
Rob
... View more