I've got a CF template that does the Splunk configuration for a license/master server and the last step starts the splunk service. I've logged the output and see Splunk starting up fine. Once I log into the EC2 instance, I see Splunk has died. The splunkd.log shows this error:
08-03-2016 12:32:40.130 -0400 FATAL ProcessRunner - Unexpected EOF from process runner child! 08-03-2016 12:32:40.130 -0400 ERROR ProcessRunner - helper process seems to have died (child killed by signal 15: Terminated
I think we'll need some more details about how you're deploying Splunk to help. Are there any more relevant lines in the log?
In the meantime, there are a few posts about the same error:
I'm deploying by using the rpm package and then running commands from the cloud formation template to configure it:
/opt/splunk/bin/splunk enable boot-start -user root --accept-license
/opt/splunk/bin/splunk start --accept-license
/opt/splunk/bin/splunk edit cluster-config -mode master -replicationfactor 3 -searchfactor 3 -cluster_label splunkmaster
Once it starts up it dies right away after the reboot.
Manually starting the service it stays up which is very confusing.
Sounds like something is wrong with your init script. If you are running as root, you don't need to specify the -user. Does starting manually reveal any errors or prompts? If you re-run the enable boot-start, does that fix it? Have you run this cfn template multiple times and had this problem with each cluster you've started?
No, manually works fine - no errors or anything.
I did find the issue with the reboot however that was a cloud configuration where it was re-running the cloudformation each reboot.
Now the issue only remains on new deployments when the cloudformation runs initially, right when it's done with the script the splunk process is killed.
And yes, the CFT has been run many times and it's the same result each time.
thanks for your help
Gotcha. You might try doing that last restart by executing /etc/init.d/splunk restart instead. Its a longshot but maybe when your user-data script completes that causes the splunk processes to exit as well.
try creating another "job" / "task" to run the restart. My guess is that the script is being killed because the cloud formation user is exiting which then orphans your process. You can try nohup for example:
nohup /opt/splunk/bin/splunk restart &
Or maybe just backgrounding it will do
/opt/splunk/bin/splunk restart &
You might even have to disown it before ending your script.
Maybe even a sleep command would work
/opt/splunk/bin/splunk restart && sleep 5
That would wait for the restart to complete and if successful sleep 5 seconds.