I have some RHEL servers that are my splunk indexers that have been crashing lately. They all get into the same Out Of Memory state and the box is effectively hung.
There's nothing of note in /var/log and the only thing these boxes do is splunk indexing.
After rebooting, it comes up just fine, and has ample free memory:
[root@REXSPLUNK-CH2-03 log]# free -tm total used free shared buffers cached Mem: 32183 4183 27999 0 241 3171 -/+ buffers/cache: 770 31412 Swap: 4095 0 4095 Total: 36279 4183 32095
I could look into making swap bigger, but considering there's only 4Gb of 32Gb used of physical RAM, I don't think that would help.
uname -a Linux REXSPLUNK-CH2-03.sys.comcast.net 2.6.18-238.19.1.el5 #1 SMP Sun Jul 10 08:43:41 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.6 (Tikanga) /opt/splunk/bin/splunk --version Splunk 6.0.1 (build 189883)
We're going to install the latest splunk today to see if there is any improvement. Just wondering how we could troubleshoot this further. Any ideas where to start?
I have the same problem. Rhel 5.8 - 32GB ram. Memory is consumed and swap too. Seeing "sar -r" after hard reboot reveals the same issue. My sar config pulls data each minute. No cpu issue, no disk io issue (local disk raid 10).
Splunk v6.0.5. It only happens to me after a couple weeks of keeping Splunkd running. Seems like a memory leak to me. Splunk support wants a diag but it's pointless right now. The system is still healthy. All I see at the time was lots of search requests from search head to indexer.
This is the article from Oracle that explains how to manage the OOM. While you can tell it to be "nicer" to Splunk, even they don't recommend turning it off.
We are going to disable THP too and see if it helps with our issue (seems to be the same). You should manually look at the /var/log/messages to confirm if OOM is killing splunkd. For some reason we we not seeing the messages in Splunk, but they were in /var/log/messages.
I am having the same problem. My indexers are on VMware, ran fine on spunk 5. I am up to the latest rpm.
logindex2 ~]# uname -a
Linux ifw9bct-logindex2.fws.doi.net 2.6.18-371.8.1.el5 #1 SMP Thu Apr 24 18:19:36 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux
logindex2 ~]# cat /etc/redhat-release
CentOS release 5.10 (Final)
logindex2 ~]# splunk --version
Splunk 6.1.1 (build 207789)