Getting Data In

Splunk exhausting memory crashing linux server

richard_whiffen
Explorer

I have some RHEL servers that are my splunk indexers that have been crashing lately. They all get into the same Out Of Memory state and the box is effectively hung.

There's nothing of note in /var/log and the only thing these boxes do is splunk indexing.

After rebooting, it comes up just fine, and has ample free memory:

[root@REXSPLUNK-CH2-03 log]# free -tm
             total       used       free     shared    buffers     cached
Mem:         32183       4183      27999          0        241       3171
-/+ buffers/cache:        770      31412
Swap:         4095          0       4095
Total:       36279       4183      32095

I could look into making swap bigger, but considering there's only 4Gb of 32Gb used of physical RAM, I don't think that would help.

 uname -a
Linux REXSPLUNK-CH2-03.sys.comcast.net 2.6.18-238.19.1.el5 #1 SMP Sun Jul 10 08:43:41 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
 cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 5.6 (Tikanga)
 /opt/splunk/bin/splunk --version
Splunk 6.0.1 (build 189883)

We're going to install the latest splunk today to see if there is any improvement. Just wondering how we could troubleshoot this further. Any ideas where to start?

Tags (3)
0 Karma

BP9906
Builder

I have the same problem. Rhel 5.8 - 32GB ram. Memory is consumed and swap too. Seeing "sar -r" after hard reboot reveals the same issue. My sar config pulls data each minute. No cpu issue, no disk io issue (local disk raid 10).
Splunk v6.0.5. It only happens to me after a couple weeks of keeping Splunkd running. Seems like a memory leak to me. Splunk support wants a diag but it's pointless right now. The system is still healthy. All I see at the time was lots of search requests from search head to indexer.

0 Karma

rsolutions
Path Finder

This is the article from Oracle that explains how to manage the OOM. While you can tell it to be "nicer" to Splunk, even they don't recommend turning it off.

http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html

We are going to disable THP too and see if it helps with our issue (seems to be the same). You should manually look at the /var/log/messages to confirm if OOM is killing splunkd. For some reason we we not seeing the messages in Splunk, but they were in /var/log/messages.

0 Karma

rickblair
New Member

I am having the same problem. My indexers are on VMware, ran fine on spunk 5. I am up to the latest rpm.

logindex2 ~]# uname -a
Linux ifw9bct-logindex2.fws.doi.net 2.6.18-371.8.1.el5 #1 SMP Thu Apr 24 18:19:36 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux
logindex2 ~]# cat /etc/redhat-release
CentOS release 5.10 (Final)
logindex2 ~]# splunk --version
Splunk 6.1.1 (build 207789)

0 Karma

MuS
SplunkTrust
SplunkTrust

Not an answer, more like information....are you aware of this http://docs.splunk.com/Documentation/Splunk/6.0.3/ReleaseNotes/SplunkandTHP known issue within RHEL and other *nix

0 Karma

lguinn2
Legend

Thanks for your reply to my question - "could this be ulimit?" I pulled my answer, in hopes that an unanswered question will attract more views and a real answer!

0 Karma
.conf21 CFS Extended through 5/20!

Don't miss your chance
to share your Splunk
wisdom in-person or
virtually at .conf21!

Call for Speakers has
been extended through
Thursday, 5/20!