Monitoring Splunk

Why is splunkd mothership daemon on a standalone search head being killed by OOM killer?

sylim_splunk
Splunk Employee
Splunk Employee

Enterprise Security search head stopped by OOM Killer twice today. The graph attached shows memory spikes and OOM killer stops the splunkd with the kernel messages like this;

*Mar 28 00:29:38 splunk-es kernel: splunkd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Mar 28 00:29:44 splunk-es kernel: [< ffffffff853ba4e4 >] oom_kill_process+0x254/0x3d0

Mar 28 00:29:44 splunk-es kernel: [< ffffffff853b9f8d >] ? oom_unkillable_task+0xcd/0x120
Mar 28 00:29:45 splunk-es kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
Mar 28 00:29:45 splunk-es kernel: splunkd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0*

This just started today in the middle of troubleshooting on some search issues.

0 Karma
1 Solution

sylim_splunk
Splunk Employee
Splunk Employee

It turns out to have been caused by debug option, debug_metrics=true - the symptoms disappeared after debug_metrics reverted back to false.
'debug_metrics=true' expands the amount of perf data in info.csv collected by the search process because it breaks down the perf data by indexers - have hundreds of indexers in multiple clusters. The aggregation does not occur and the data for each indexer is preserved. The perf data becomes hundred times greater and the search head loads data for analysis.
Until any fix is available please monitor the memory usage of splunkd whenever you use 'debug_metrics = true' in limits.conf.

View solution in original post

0 Karma

sylim_splunk
Splunk Employee
Splunk Employee

It turns out to have been caused by debug option, debug_metrics=true - the symptoms disappeared after debug_metrics reverted back to false.
'debug_metrics=true' expands the amount of perf data in info.csv collected by the search process because it breaks down the perf data by indexers - have hundreds of indexers in multiple clusters. The aggregation does not occur and the data for each indexer is preserved. The perf data becomes hundred times greater and the search head loads data for analysis.
Until any fix is available please monitor the memory usage of splunkd whenever you use 'debug_metrics = true' in limits.conf.

0 Karma
Get Updates on the Splunk Community!

Enter the Splunk Community Dashboard Challenge for Your Chance to Win!

The Splunk Community Dashboard Challenge is underway! This is your chance to showcase your skills in creating ...

.conf24 | Session Scheduler is Live!!

.conf24 is happening June 11 - 14 in Las Vegas, and we are thrilled to announce that the conference catalog ...

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...