Getting Data In

splunkd causes OutOfMemory when searching data with large event size

yuanliu
SplunkTrust
SplunkTrust

I am onboarding a JSON dataset whose event size is very close to 1MB.  I have to increase TRUNCATE to 1000000 (from default of 10000).

TRUNCATE = 1000000
KV_MODE = json

But when I perform a search on this sourcetype, splunkd's memory demand sky-rockets and causing oom_killer to kill random processes, effectively bringing down the OS.

I see some default sourcetypes (etc/system/defaut/props.conf) already have TRUNCATE=1000000.  But they use INDEXED_EXTRACTIONS=json.  Is index-time extraction the only/recommended way to handle these exceedingly large events?  Is there some formula to determine search time memory need based on event size?

Labels (3)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

1. 1MByte events are... huge. Whether it is kv-json or plain regex-based extractions, it's gonna be heavy.

I think search-time extraction (aka KV_MODE) assumes a lot of contingencies so it holds a lot more data in memory.  That is causing the memory pressure.  After taking away KV_MODE, there is no problem in search.  I then apply | spath inline.  No problem at all.  It is actually very performant.

I am starting to learn some difference between implied actions and some evaluation actions. (See a Slack thread about tojson and eval.  In this case, spath behaves like eval.)  I am guessing that there is a good reason why certain implied actions consumes so much more resource.  Maybe that's why those large-event default sourcetypes use index-time extraction instead.

In the end, index-time extraction and inline spath are the only options for such sourcetypes.

View solution in original post

PickleRick
SplunkTrust
SplunkTrust

1. 1MByte events are... huge. Whether it is kv-json or plain regex-based extractions, it's gonna be heavy.

2. As a side note - if splunkd brings down whole OS it might be the time to tweak the VMM parameters. (swappiness, zram, oom killer priorities...)

yuanliu
SplunkTrust
SplunkTrust

1. 1MByte events are... huge. Whether it is kv-json or plain regex-based extractions, it's gonna be heavy.

I think search-time extraction (aka KV_MODE) assumes a lot of contingencies so it holds a lot more data in memory.  That is causing the memory pressure.  After taking away KV_MODE, there is no problem in search.  I then apply | spath inline.  No problem at all.  It is actually very performant.

I am starting to learn some difference between implied actions and some evaluation actions. (See a Slack thread about tojson and eval.  In this case, spath behaves like eval.)  I am guessing that there is a good reason why certain implied actions consumes so much more resource.  Maybe that's why those large-event default sourcetypes use index-time extraction instead.

In the end, index-time extraction and inline spath are the only options for such sourcetypes.

Get Updates on the Splunk Community!

App Platform's 2025 Year in Review: A Year of Innovation, Growth, and Community

As we step into 2026, it’s the perfect moment to reflect on what an extraordinary year 2025 was for the Splunk ...

Operationalizing Entity Risk Score with Enterprise Security 8.3+

Overview Enterprise Security 8.3 introduces a powerful new feature called “Entity Risk Scoring” (ERS) for ...

Unlock Database Monitoring with Splunk Observability Cloud

  In today’s fast-paced digital landscape, even minor database slowdowns can disrupt user experiences and ...