I have a splunk cluster (RHEL) with 2 indexers, and they seem to have a memory leak. Memory usage grows steadily until it runs out of memory and the OOM Killer kills the splunkd process. The only way to get the memory back is to restart the server (restarting splunk doesn't help). I've got two clusters that do this, and a few single instance splunk servers that never do this. Oddly enough, it's usually one of the indexers in the cluster that eats up memory first. ulimit and THP is set properly on all the servers. Has this happened to anyone?
On your indexers, create or update $SPLUNK_HOME/etc/system/local/limits.conf and add the following:
[defaults]
max_mem_usage_mb = 2000
Where 2000 is the amount of memory you want to restrict Splunk to use for the search process on that node.
Cycle Splunk.
You also may want to evaluate the number of scheduled searches you have running, or the type of ad-hoc queries may be using.
Too many of either can cause the issue as well.
@gregbo which version of Splunk are you running on?
If you have Splunk Entitlement you should work with Splunk Support by providing them with heap dump, diag file and dispatch directory with debug level details.