Hi all,
For the second time this week, on all three indexers in our Splunk cluster, Splunkd crashed. Syslog showed me this was caused by Splunk trying to access parts of the memory it's not supposed to (segfault).
It seems things go wrong when Splunk tries to send some stuff to one particular search head in the cluster. Below are the logs from both syslog and splunkd.log. The log files look the same on all three indexers in the cluster (the logs below are from INDEXER3).
Syslog:
Jan 13 09:57:10 INDEXER3 kernel: [490743.658693] splunkd[8748]: segfault at 0 ip 00007fbac3941da0 sp 00007fbab19fc9b0 error 4 in libc-2.19.so[7fbac38d3000+1bb000]
splunkd.log:
01-13-2016 09:57:10.358 +0100 ERROR SearchResultsWriter - Unable to open output file: path=/splunk/var/run/splunk/dispatch/remote_SEARCHHEAD2_subsearch_subsearch__cGF0cmljay52YW4uZGVuLmJyb2VrQHZhbmRlcmxhbmRlLmNvbQ_cGF0cmljay52YW4uZG$
So far I haven't been able so find out what went wrong. Anyone..?
My guess is that you've set too many limits at "max" or highest possible value for your application... and now someone is running searches so large they're eating all of your RAM and wanting more.
Suggest a thorough diagnosis of _internal and _audit to determine which search causes the failure, and then optimize / disable the search and fix your limits so that this doesnt happen again.
You may find this app, sos, or DMC to be helpful while diagnosing which search is consuming most ram, etc.:
I have 40 indexers. Why would this type of error come from only one particular indexer?
What version are you running on your idx and search tiers?
You should open a ticket with support also, they can run through the crash and identify the issue.
Thx for the reply, everything is running on 6.2.0
I'll put in a ticket as well, kinda hoped someone knew some easy insta-fix 🙂