splunkd died every day with the same error
FATAL ProcessRunner - Unexpected EOF from process runner child!
ERROR ProcessRunner - helper process seems to have died (child killed by signal 9: Killed)!
I can't see anything that may caused this... it does not last for 24 hours after restart...
here's the partial log:
04-13-2013 13:37:03.498 +0000 WARN FilesystemChangeWatcher - error getting attributes of path "/home/c9logs/c9logs/edgdc2/sdi_slce28vmf6011/.zfs/snapshot/.auto-1365364800/config/m_domains/tasdc2_domain/servers/AdminServer/adr": Permission denied
04-13-2013 13:37:03.499 +0000 WARN FilesystemChangeWatcher - error getting attributes of path "/home/c9logs/c9logs/edgdc2/sdi_slce28vmf6011/.zfs/snapshot/.auto-1365364800/config/m_domains/tasdc2_domain/servers/AdminServer/sysman": Permission denied
04-13-2013 13:38:37.102 +0000 FATAL ProcessRunner - Unexpected EOF from process runner child!
04-13-2013 13:38:37.325 +0000 ERROR ProcessRunner - helper process seems to have died (child killed by signal 9: Killed)!
Your ulimits are not set correctly, or are using the system defaults.
As a result, splunkd is likely using more memory than allowed or available, so the kernel kills the process in order to protect itself.
Did you get this resolved?
Can you validate and confirm if splunk was getting killed post an active session is terminated, that is, as soon as some one logs out of your splunk session or server, and if it dies after that.
We had this problem with an infinite loop inside a macro (calling itself) even though we had [search] limits.conf set up on memory.
how did you find the macro causing issues and calling itslef. Will be helpful for me to validate the same
Correlated with changes made that day
My .02 is that this is memory related. I am having the same issue and a check on /var/log/messages shows:
Apr 20 01:59:06 splog1 kernel: Out of memory: Kill process 45929 (splunkd) score 17 or sacrifice child
Apr 20 01:59:06 splog1 kernel: Killed process 45934, UID 5000, (splunkd) total-vm:66104kB, anon-rss:1260kB, file-rss:4kB
This was happening on a new instance of Enterprise 6.5.3. I traced it to an input source that was particulary large and hadn't been indexed for a while due to the upgrade. I had to restart splunkd a few times on the indexer and now it's running well.
Was this ever resolved?
Check syslog/dmesg to see if the kernel's oom_killer is getting invoked
Out of memory: Kill process 7575 (splunkd) score 201 or sacrifice child
Killed process 7576, UID 1000, (splunkd) total-vm:70232kB, anon-rss:392kB, file-rss:152kB
Signal 9 is a KILL signal from an external process. It is likely that your OS has some kind of monitor or other setting on it that kills processes that do certain things. Perhaps your administrator is watching for memory usage, access to certain files, or other things. You should consult with your system admin to find out what they have put in place.