I'm using a Splunk Light Cloud instance to index some logs from a web application hosted on Heroku. This is a demo instance that is brand new and has not been customized. I'm using the basic syslog source type to feed a forwarder, which is all working fine.
One thing that surprised me a bit, is that without any configuration every k=v pair is automatically being extracted as a field. e.g. the standard host=xyz dyno=web.1 status=200 entries in the log are each getting extracted to separate fields host , status and dyno . That is all well and good for those fields but currently the size of our Splunk index files reported in the license cube report are 10 times larger than the raw data files and I suspect part of the reason is all the extra fields getting indexed. A lot of our application urls contain k=v type patterns in the query string and these are getting extracted to fields that are not meaningful to us. I'd prefer to remove these fields and just extract the ones I care about using my own regex expressions.
However, I am not able to even determine where this transformation is occurring, much less stop it. From what I can tell, the standard syslog sourcetype has no transformation to perform this extraction. Is there any way to see, given a particular field, what was responsible for its extraction? Or does anyone know specifically how I can prevent these field extractions?
... View more