All Apps and Add-ons

Field extraction a problem with large events?

wmosher
Path Finder

Automatic field extraction doesn't appear to always work after a certain number of characters deep into a (single line) event. One of our data types always ends in a key=value timinging of how long the transaction took to process. It appears that after around 10,000 bytes of data the field does not automatically get extracted. The event is not visibly truncated, and I can see the field we want in _raw. I can even rex it out just fine. Is there a setting somewhere that tells the field extractor how deep in an event to look?

1 Solution

dwaddle
SplunkTrust
SplunkTrust

Yes, there is a limit for auto-kv extraction. The default is 10,240 chars. To change this, add/edit $SPLUNK_HOME/etc/system/local/limits.conf with this stanza/setting:

[kv]
# truncate _raw to to this size and then do auto KV
# 20480, or whatever value you otherwise desire
maxchars = 20480

I'm not sure how far I would be willing to turn this setting up. At some point, it could begin to negatively impact search performance. You may see a small increase in CPU usage during searches as you raise this.

View solution in original post

dwaddle
SplunkTrust
SplunkTrust

Yes, there is a limit for auto-kv extraction. The default is 10,240 chars. To change this, add/edit $SPLUNK_HOME/etc/system/local/limits.conf with this stanza/setting:

[kv]
# truncate _raw to to this size and then do auto KV
# 20480, or whatever value you otherwise desire
maxchars = 20480

I'm not sure how far I would be willing to turn this setting up. At some point, it could begin to negatively impact search performance. You may see a small increase in CPU usage during searches as you raise this.

View solution in original post

dwaddle
SplunkTrust
SplunkTrust

Well, I would not do index-time extraction. Perhaps a search-time regex extraction, but it's difficult to say without measurement whether it will be better than auto-kv. Theoretically, increasing this from 10,240 to 102,400 increases the amount of CPU usage by 10x (assuming an O(n) operation). Practically, this may only mean a handful of nanoseconds. One advantage to regex versus auto-kv is that you can limit the regex scope to particular sourcetypes. Raising the kv limit affects processing for every event. Best advice is "measure and compare".

wmosher
Path Finder

Thanks dwaddle this is exactly what I'm looking for.

We have the occasional event above 51,200 characters. The field I am interested in is always the last thing in the event. Since performance could be a concern would it make more sense to extract that field at index time with a transform or would this also be just as taxing?

.conf21 CFS Extended through 5/20!

Don't miss your chance
to share your Splunk
wisdom in-person or
virtually at .conf21!

Call for Speakers has
been extended through
Thursday, 5/20!