All Apps and Add-ons

Field extraction a problem with large events?

wmosher
Path Finder

Automatic field extraction doesn't appear to always work after a certain number of characters deep into a (single line) event. One of our data types always ends in a key=value timinging of how long the transaction took to process. It appears that after around 10,000 bytes of data the field does not automatically get extracted. The event is not visibly truncated, and I can see the field we want in _raw. I can even rex it out just fine. Is there a setting somewhere that tells the field extractor how deep in an event to look?

1 Solution

dwaddle
SplunkTrust
SplunkTrust

Yes, there is a limit for auto-kv extraction. The default is 10,240 chars. To change this, add/edit $SPLUNK_HOME/etc/system/local/limits.conf with this stanza/setting:

[kv]
# truncate _raw to to this size and then do auto KV
# 20480, or whatever value you otherwise desire
maxchars = 20480

I'm not sure how far I would be willing to turn this setting up. At some point, it could begin to negatively impact search performance. You may see a small increase in CPU usage during searches as you raise this.

View solution in original post

dwaddle
SplunkTrust
SplunkTrust

Yes, there is a limit for auto-kv extraction. The default is 10,240 chars. To change this, add/edit $SPLUNK_HOME/etc/system/local/limits.conf with this stanza/setting:

[kv]
# truncate _raw to to this size and then do auto KV
# 20480, or whatever value you otherwise desire
maxchars = 20480

I'm not sure how far I would be willing to turn this setting up. At some point, it could begin to negatively impact search performance. You may see a small increase in CPU usage during searches as you raise this.

dwaddle
SplunkTrust
SplunkTrust

Well, I would not do index-time extraction. Perhaps a search-time regex extraction, but it's difficult to say without measurement whether it will be better than auto-kv. Theoretically, increasing this from 10,240 to 102,400 increases the amount of CPU usage by 10x (assuming an O(n) operation). Practically, this may only mean a handful of nanoseconds. One advantage to regex versus auto-kv is that you can limit the regex scope to particular sourcetypes. Raising the kv limit affects processing for every event. Best advice is "measure and compare".

wmosher
Path Finder

Thanks dwaddle this is exactly what I'm looking for.

We have the occasional event above 51,200 characters. The field I am interested in is always the last thing in the event. Since performance could be a concern would it make more sense to extract that field at index time with a transform or would this also be just as taxing?

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...