<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Splunk service  on indexer server gets killed by OOM killer when it should not. in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Splunk-service-on-indexer-server-gets-killed-by-OOM-killer-when/m-p/467207#M191872</link>
    <description>&lt;P&gt;We operates splunk platform of 10+ SHC members &amp;amp; indexer cluster with 100+, version 7.2.9.  From time to time we see the Splunk services get killed by OOM killer from multiple indexers. &lt;BR /&gt;
Using a search below it shows memory usage by Splunk but when it kills Splunk the memory usage is far less than 250GB (the max mem on each indexer) and even less than 100GB according to the graph.&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;&lt;P&gt;Search:&lt;BR /&gt;
index=_introspection host= sourcetype=splunk_resource_usage component=PerProcess&lt;BR /&gt;
| rename "data.args" as args, "data.process" as process, "data.process_type" as processt&lt;BR /&gt;
| eval process_class=case(((process == "splunkd") AND like(processt,"search")),"Splunk Search",((process == "splunkd") AND ((like(args,"-p %start%") AND (true() XOR like(args,"%process-runner%"))) OR (args == "service"))),"splunkd server",((process == "splunkd") AND isnotnull(sid)),"search",((process == "splunkd") AND ((like(args,"fsck%") OR like(args,"recover-metadata%")) OR like(args,"cluster_thing"))),"index service",((process == "splunkd") AND (args == "instrument-resource-usage")),"scripted input",((like(process,"python%") AND like(args,"%/appserver/mrsparkle/root.py%")) OR like(process,"splunkweb")),"Splunk Web",isnotnull(process_class),process_class)&lt;BR /&gt;
| bin _time span=10s&lt;BR /&gt;
| stats latest(data.mem_used) AS resource_usage_dedup latest(process_class) AS process_class by data.pid, _time&lt;BR /&gt;
| stats sum(resource_usage_dedup) AS resource_usage by _time, process_class&lt;BR /&gt;
| timechart minspan=10s bins=200 median(resource_usage) AS "Resource Usage" by process_class&lt;/P&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;P&gt;Graph:&lt;BR /&gt;
&lt;IMG src="https://community.splunk.com/storage/temp/276772-idexer-memusagebysplk.png" alt="alt text" /&gt;&lt;/P&gt;&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;We found the search that caused the peak in the graph but it still appears far less than the max memory, 250gb available on the server &lt;/P&gt;</description>
    <pubDate>Wed, 30 Sep 2020 03:23:37 GMT</pubDate>
    <dc:creator>sylim_splunk</dc:creator>
    <dc:date>2020-09-30T03:23:37Z</dc:date>
    <item>
      <title>Splunk service  on indexer server gets killed by OOM killer when it should not.</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Splunk-service-on-indexer-server-gets-killed-by-OOM-killer-when/m-p/467207#M191872</link>
      <description>&lt;P&gt;We operates splunk platform of 10+ SHC members &amp;amp; indexer cluster with 100+, version 7.2.9.  From time to time we see the Splunk services get killed by OOM killer from multiple indexers. &lt;BR /&gt;
Using a search below it shows memory usage by Splunk but when it kills Splunk the memory usage is far less than 250GB (the max mem on each indexer) and even less than 100GB according to the graph.&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;&lt;P&gt;Search:&lt;BR /&gt;
index=_introspection host= sourcetype=splunk_resource_usage component=PerProcess&lt;BR /&gt;
| rename "data.args" as args, "data.process" as process, "data.process_type" as processt&lt;BR /&gt;
| eval process_class=case(((process == "splunkd") AND like(processt,"search")),"Splunk Search",((process == "splunkd") AND ((like(args,"-p %start%") AND (true() XOR like(args,"%process-runner%"))) OR (args == "service"))),"splunkd server",((process == "splunkd") AND isnotnull(sid)),"search",((process == "splunkd") AND ((like(args,"fsck%") OR like(args,"recover-metadata%")) OR like(args,"cluster_thing"))),"index service",((process == "splunkd") AND (args == "instrument-resource-usage")),"scripted input",((like(process,"python%") AND like(args,"%/appserver/mrsparkle/root.py%")) OR like(process,"splunkweb")),"Splunk Web",isnotnull(process_class),process_class)&lt;BR /&gt;
| bin _time span=10s&lt;BR /&gt;
| stats latest(data.mem_used) AS resource_usage_dedup latest(process_class) AS process_class by data.pid, _time&lt;BR /&gt;
| stats sum(resource_usage_dedup) AS resource_usage by _time, process_class&lt;BR /&gt;
| timechart minspan=10s bins=200 median(resource_usage) AS "Resource Usage" by process_class&lt;/P&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;P&gt;Graph:&lt;BR /&gt;
&lt;IMG src="https://community.splunk.com/storage/temp/276772-idexer-memusagebysplk.png" alt="alt text" /&gt;&lt;/P&gt;&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;We found the search that caused the peak in the graph but it still appears far less than the max memory, 250gb available on the server &lt;/P&gt;</description>
      <pubDate>Wed, 30 Sep 2020 03:23:37 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Splunk-service-on-indexer-server-gets-killed-by-OOM-killer-when/m-p/467207#M191872</guid>
      <dc:creator>sylim_splunk</dc:creator>
      <dc:date>2020-09-30T03:23:37Z</dc:date>
    </item>
    <item>
      <title>Re: Splunk service  on indexer server gets killed by OOM killer when it should not.</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Splunk-service-on-indexer-server-gets-killed-by-OOM-killer-when/m-p/467208#M191873</link>
      <description>&lt;P&gt;Further investigation, the kernel message has this;&lt;/P&gt;

&lt;P&gt;&lt;EM&gt;Dec 01 10:15:29 idx15 kernel: [72693.445279] Task in /system.slice/splunk.service killed as a result of limit of /system.slice/splunk.service&lt;BR /&gt;
Dec 01 10:15:29 idx15 kernel: [72693.445891] memory: usage 104857600kB, limit 104857600kB, failcnt 1977805211&lt;/EM&gt;&lt;/P&gt;

&lt;P&gt;According to the log it hits the max usage defined by splunk.service of systemd unit, which is limited by "MemoryLimit=100G" - this appears to be static regardless of the memory available on the server.&lt;BR /&gt;
We decide to increase the value to the 90% of the memory installed on the server.  If you see the similar symptoms you may need to check the value for the param and adjust it accordingly.&lt;/P&gt;</description>
      <pubDate>Tue, 17 Dec 2019 21:16:06 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Splunk-service-on-indexer-server-gets-killed-by-OOM-killer-when/m-p/467208#M191873</guid>
      <dc:creator>sylim_splunk</dc:creator>
      <dc:date>2019-12-17T21:16:06Z</dc:date>
    </item>
    <item>
      <title>Re: Splunk service  on indexer server gets killed by OOM killer when it should not.</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Splunk-service-on-indexer-server-gets-killed-by-OOM-killer-when/m-p/467209#M191874</link>
      <description>&lt;P&gt;This doc link mentions about it, the version 8.0+ will adjust it according to the available memory configured on the server.&lt;/P&gt;

&lt;P&gt;&lt;A href="https://docs.splunk.com/Documentation/Splunk/8.0.0/Admin/RunSplunkassystemdservice#Configure_systemd_using_enable_boot-start"&gt;https://docs.splunk.com/Documentation/Splunk/8.0.0/Admin/RunSplunkassystemdservice#Configure_systemd_using_enable_boot-start&lt;/A&gt; &lt;/P&gt;</description>
      <pubDate>Tue, 17 Dec 2019 21:46:46 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Splunk-service-on-indexer-server-gets-killed-by-OOM-killer-when/m-p/467209#M191874</guid>
      <dc:creator>sylim_splunk</dc:creator>
      <dc:date>2019-12-17T21:46:46Z</dc:date>
    </item>
  </channel>
</rss>

