Monitoring Splunk

Why is splunkd mothership daemon on a standalone search head being killed by OOM killer?

sylim_splunk
Splunk Employee
Splunk Employee

Enterprise Security search head stopped by OOM Killer twice today. The graph attached shows memory spikes and OOM killer stops the splunkd with the kernel messages like this;

*Mar 28 00:29:38 splunk-es kernel: splunkd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Mar 28 00:29:44 splunk-es kernel: [< ffffffff853ba4e4 >] oom_kill_process+0x254/0x3d0

Mar 28 00:29:44 splunk-es kernel: [< ffffffff853b9f8d >] ? oom_unkillable_task+0xcd/0x120
Mar 28 00:29:45 splunk-es kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
Mar 28 00:29:45 splunk-es kernel: splunkd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0*

This just started today in the middle of troubleshooting on some search issues.

0 Karma
1 Solution

sylim_splunk
Splunk Employee
Splunk Employee

It turns out to have been caused by debug option, debug_metrics=true - the symptoms disappeared after debug_metrics reverted back to false.
'debug_metrics=true' expands the amount of perf data in info.csv collected by the search process because it breaks down the perf data by indexers - have hundreds of indexers in multiple clusters. The aggregation does not occur and the data for each indexer is preserved. The perf data becomes hundred times greater and the search head loads data for analysis.
Until any fix is available please monitor the memory usage of splunkd whenever you use 'debug_metrics = true' in limits.conf.

View solution in original post

0 Karma

sylim_splunk
Splunk Employee
Splunk Employee

It turns out to have been caused by debug option, debug_metrics=true - the symptoms disappeared after debug_metrics reverted back to false.
'debug_metrics=true' expands the amount of perf data in info.csv collected by the search process because it breaks down the perf data by indexers - have hundreds of indexers in multiple clusters. The aggregation does not occur and the data for each indexer is preserved. The perf data becomes hundred times greater and the search head loads data for analysis.
Until any fix is available please monitor the memory usage of splunkd whenever you use 'debug_metrics = true' in limits.conf.

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

New This Month - Observability Updates Give Extended Visibility and Improve User ...

This month is a collection of special news! From Magic Quadrant updates to AppDynamics integrations to ...

Intro to Splunk Synthetic Monitoring

In our last post, we mentioned that the 3 key pieces of observability – metrics, logs, and traces – provide ...