Solved: Field extraction a problem with large events?

wmosher · ‎05-14-2012

Automatic field extraction doesn't appear to always work after a certain number of characters deep into a (single line) event. One of our data types always ends in a key=value timinging of how long the transaction took to process. It appears that after around 10,000 bytes of data the field does not automatically get extracted. The event is not visibly truncated, and I can see the field we want in _raw. I can even rex it out just fine. Is there a setting somewhere that tells the field extractor how deep in an event to look?

dwaddle · ‎05-14-2012

Yes, there is a limit for auto-kv extraction. The default is 10,240 chars. To change this, add/edit $SPLUNK_HOME/etc/system/local/limits.conf with this stanza/setting:

[kv]
# truncate _raw to to this size and then do auto KV
# 20480, or whatever value you otherwise desire
maxchars = 20480

I'm not sure how far I would be willing to turn this setting up. At some point, it could begin to negatively impact search performance. You may see a small increase in CPU usage during searches as you raise this.

View solution in original post

dwaddle · ‎05-14-2012

Yes, there is a limit for auto-kv extraction. The default is 10,240 chars. To change this, add/edit $SPLUNK_HOME/etc/system/local/limits.conf with this stanza/setting:

[kv]
# truncate _raw to to this size and then do auto KV
# 20480, or whatever value you otherwise desire
maxchars = 20480

I'm not sure how far I would be willing to turn this setting up. At some point, it could begin to negatively impact search performance. You may see a small increase in CPU usage during searches as you raise this.

dwaddle · ‎05-25-2012

Well, I would not do index-time extraction. Perhaps a search-time regex extraction, but it's difficult to say without measurement whether it will be better than auto-kv. Theoretically, increasing this from 10,240 to 102,400 increases the amount of CPU usage by 10x (assuming an O(n) operation). Practically, this may only mean a handful of nanoseconds. One advantage to regex versus auto-kv is that you can limit the regex scope to particular sourcetypes. Raising the kv limit affects processing for every event. Best advice is "measure and compare".

wmosher · ‎05-22-2012

Thanks dwaddle this is exactly what I'm looking for.

We have the occasional event above 51,200 characters. The field I am interested in is always the last thing in the event. Since performance could be a concern would it make more sense to extract that field at index time with a transform or would this also be just as taxing?

Field extraction a problem with large events?

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Announcing Modern Navigation: A New Era of Splunk User Experience

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

Join the Conversation