Splunk-optimize is launching on our indexers and eating up a few GB of memory, then Redhat's out-of-memory manager kills the splunkd process (as seen in /var/log/messages). The 3 indexers have 16 CPUs and 16GB of ram (total logging of 70GB/day across the 3 indexers), so shouldn't be a resource issue. So aside from disabling the OOM (or reducing the likelihood of OOM killing Splunk http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html ), is there any other way to manage the splunk-optimize process to reduce the memory it use?
There's options for splunk-optimize stuff in indexes.conf:
http://docs.splunk.com/Documentation/Splunk/6.2.1/Admin/Indexesconf
If splunk-optimize is taking up that much resources to do its thing, might be running into bucket config issues as well. Have you increased bucket sizing (auto-high-volume) on the higher-volume indexes?
Here's some of the settings we've got on some of our higher-volume indexes:
maxMemMB = 20
maxConcurrentOptimizes = 6
maxHotIdleSecs = 86400
maxHotBuckets = 10
maxDataSize = auto_high_volume
Be sure to read about THP in redhat. Splunk recommends disabling this.
http://docs.splunk.com/Documentation/Splunk/6.2.1/ReleaseNotes/SplunkandTHP
We have confirmed THP is disabled. We also set the OOM manager to -16 for the Splunk process. We bumped the memory on the indexers to 32GB, but the indexers have since crashed again. We'll get back to Splunk support, but something has to be causing the splunk-optimize process to use much more memory than normal.
Little lost on this one as we have never seen this before. We do have ES running in this environment, but other than that there isn't too much data, nor too many indexes created. Any other thoughts or suggestions?
Did you come to solution here?
The initial problem was the indexers were crashing. Splunk determined it was a bug a a fix was released as part of 6.0.3 I believe. The cause was an error with memory mapping between Splunk, the OS and VMware.
I will definitely try disabling THP and see what happens. I'll let you know how it goes.
There are some settings around splunk-optimize and similar indexing helper processes in http://docs.splunk.com/Documentation/Splunk/6.2.1/Admin/indexesconf (search for "optimize"), but I'm not familiar enough with those to give a qualified recommendation. The directly memory-related settings all come with big caveats.
I would seek configuration relief from the OOM/OS though, killing other people's processes is mean.
This is the article from Oracle that explains how to manage the OOM. While you can tell it to be "nicer" to Splunk, even they don't recommend turning it off.
http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html
16GB is not a lot, especially when you have processes killed for being out of memory. Consider adding memory.
Thanks Martin. We originally started with 12GB of ram, but bumped it to 16GB. The only time we seem to run into issues is when the splunk-optimize process runs. Otherwise it is fine. The only other thing that is unique to this environment is that it does run Splunk ES (with a select few other apps that).
I realize we can add more ram, and that may ultimately be what is necessary. Just trying to figure out if there is anything else we can do to help manage it or reduce the memory requirements for the splunk-optimize process.