I am onboarding a JSON dataset whose event size is very close to 1MB. I have to increase TRUNCATE to 1000000 (from default of 10000).
TRUNCATE = 1000000
KV_MODE = jsonBut when I perform a search on this sourcetype, splunkd's memory demand sky-rockets and causing oom_killer to kill random processes, effectively bringing down the OS.
I see some default sourcetypes (etc/system/defaut/props.conf) already have TRUNCATE=1000000. But they use INDEXED_EXTRACTIONS=json. Is index-time extraction the only/recommended way to handle these exceedingly large events? Is there some formula to determine search time memory need based on event size?
1. 1MByte events are... huge. Whether it is kv-json or plain regex-based extractions, it's gonna be heavy.
I think search-time extraction (aka KV_MODE) assumes a lot of contingencies so it holds a lot more data in memory. That is causing the memory pressure. After taking away KV_MODE, there is no problem in search. I then apply | spath inline. No problem at all. It is actually very performant.
I am starting to learn some difference between implied actions and some evaluation actions. (See a Slack thread about tojson and eval. In this case, spath behaves like eval.) I am guessing that there is a good reason why certain implied actions consumes so much more resource. Maybe that's why those large-event default sourcetypes use index-time extraction instead.
In the end, index-time extraction and inline spath are the only options for such sourcetypes.
1. 1MByte events are... huge. Whether it is kv-json or plain regex-based extractions, it's gonna be heavy.
2. As a side note - if splunkd brings down whole OS it might be the time to tweak the VMM parameters. (swappiness, zram, oom killer priorities...)
1. 1MByte events are... huge. Whether it is kv-json or plain regex-based extractions, it's gonna be heavy.
I think search-time extraction (aka KV_MODE) assumes a lot of contingencies so it holds a lot more data in memory. That is causing the memory pressure. After taking away KV_MODE, there is no problem in search. I then apply | spath inline. No problem at all. It is actually very performant.
I am starting to learn some difference between implied actions and some evaluation actions. (See a Slack thread about tojson and eval. In this case, spath behaves like eval.) I am guessing that there is a good reason why certain implied actions consumes so much more resource. Maybe that's why those large-event default sourcetypes use index-time extraction instead.
In the end, index-time extraction and inline spath are the only options for such sourcetypes.