Solved: Why is splunkd mothership daemon on a standalone s...

sylim_splunk · ‎04-03-2019

Enterprise Security search head stopped by OOM Killer twice today. The graph attached shows memory spikes and OOM killer stops the splunkd with the kernel messages like this;

*Mar 28 00:29:38 splunk-es kernel: splunkd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Mar 28 00:29:44 splunk-es kernel: [< ffffffff853ba4e4 >] oom_kill_process+0x254/0x3d0

Mar 28 00:29:44 splunk-es kernel: [< ffffffff853b9f8d >] ? oom_unkillable_task+0xcd/0x120
Mar 28 00:29:45 splunk-es kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
Mar 28 00:29:45 splunk-es kernel: splunkd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0*

This just started today in the middle of troubleshooting on some search issues.

sylim_splunk · ‎04-03-2019

It turns out to have been caused by debug option, debug_metrics=true - the symptoms disappeared after debug_metrics reverted back to false.
'debug_metrics=true' expands the amount of perf data in info.csv collected by the search process because it breaks down the perf data by indexers - have hundreds of indexers in multiple clusters. The aggregation does not occur and the data for each indexer is preserved. The perf data becomes hundred times greater and the search head loads data for analysis.
Until any fix is available please monitor the memory usage of splunkd whenever you use 'debug_metrics = true' in limits.conf.

View solution in original post

sylim_splunk · ‎04-03-2019

It turns out to have been caused by debug option, debug_metrics=true - the symptoms disappeared after debug_metrics reverted back to false.
'debug_metrics=true' expands the amount of perf data in info.csv collected by the search process because it breaks down the perf data by indexers - have hundreds of indexers in multiple clusters. The aggregation does not occur and the data for each indexer is preserved. The perf data becomes hundred times greater and the search head loads data for analysis.
Until any fix is available please monitor the memory usage of splunkd whenever you use 'debug_metrics = true' in limits.conf.

Why is splunkd mothership daemon on a standalone search head being killed by OOM killer?

Routing logs with Splunk OTel Collector for Kubernetes

New This Month - Observability Updates Give Extended Visibility and Improve User ...

Intro to Splunk Synthetic Monitoring