I'm using a Splunk Light Cloud instance to index some logs from a web application hosted on Heroku. This is a demo instance that is brand new and has not been customized. I'm using the basic syslog source type to feed a forwarder, which is all working fine.
One thing that surprised me a bit, is that without any configuration every k=v
pair is automatically being extracted as a field. e.g. the standard host=xyz dyno=web.1 status=200
entries in the log are each getting extracted to separate fields host
, status
and dyno
. That is all well and good for those fields but currently the size of our Splunk index files reported in the license cube report are 10 times larger than the raw data files and I suspect part of the reason is all the extra fields getting indexed. A lot of our application urls contain k=v
type patterns in the query string and these are getting extracted to fields that are not meaningful to us. I'd prefer to remove these fields and just extract the ones I care about using my own regex expressions.
However, I am not able to even determine where this transformation is occurring, much less stop it. From what I can tell, the standard syslog sourcetype has no transformation to perform this extraction. Is there any way to see, given a particular field, what was responsible for its extraction? Or does anyone know specifically how I can prevent these field extractions?
The solution is to set KV_MODE = none
for the sourcetype. You should be able to do that in the Add Data wizard. Once you've selected the source, click "Advanced" in the "Set Source Type" screen. There, you can add a new setting called "KV_MODE" with value "none".
The solution is to set KV_MODE = none
for the sourcetype. You should be able to do that in the Add Data wizard. Once you've selected the source, click "Advanced" in the "Set Source Type" screen. There, you can add a new setting called "KV_MODE" with value "none".
I did not see "Advanced" in the Add Data wizard but its there when I edit the source type and adding this value to the syslog sourcetype did do what I need. Thanks!
@jeremyjh, extra information at Configure automatic key-value field extraction
KV_MODE = none
-- Disables field extraction for the source, source type, or host identified by the stanza name. Use this setting to ensure that other regular expressions that you create are not overridden by automatic field/value extraction for a particular source, source type, or host. Use this setting to increase search performance by disabling extraction for common but nonessential fields. We have some field extraction examples at the end of this topic that demonstrate the disabling of field extraction in different circumstances.