Splunk is intermittently not automatically extracting fields in the regular foo=bar format. E.g. in this event
Jan 9 11:33:37 sv121-mw4 [mw2] INFO auth_id="000767E10050" eventTime="1389227425697" household_id="5c8977b2-7f49-11df-a4df-001321c9413d" partner="partner1" pid="13353" uri="/v2/events" event="applicationOpened" mac="00:07:67:E1:00:50" application="YouTube" request_id="adc211e2-78c5-11e3-b292-3c4a92ebea90" version="2.767.cf97ae4" https="true" billing_partner="partner1" duration="16.88" serial="660589501000016" debugEvent="True" remote_ip="100.64.10.309"
all the fields were extracted except 'application'.
I don't think it's a limits thing because in limits.conf in the kv stanza we have limit = 250 and maxcols = 512 and there definitely aren't that many fields in the results of the search.
I haven't been able to find any pattern as to which fields don't get extracted or when. There isn't a field that never gets extracted but if I run the same search it is always the same field that doesn't get extracted.
To help diagnose what's going on, have you tried
./splunk cmd btool props list --debug | less
then, at the prompt, you can enter
/syslog to jump to the beginning of the syslog stanza.
This will show you all the props.conf settings related to syslog.
The Splunk on Splunk app (SOS) can also give you an overall view of the settings related to a particular sourcetype.
Remember that field extraction may vary based on the app context (i.e., workspace) that you are using for the search.
Using SOS I've confirmed that the only non-default attributes in the [syslog] stanza of props.conf are EVALs, LOOKUPs and a couple of EXTRACTs that shouldn't impact the missing fields.
Further, when I append
| extract pairdelim=" ", kvdelim="=" to my searches, the fields that weren't extracted are now extracted. Aren't those settings the same as auto extraction though?
I am also experiencing this exact set of symptoms. I have seen it happen with many different custom sourcetypes, so I do not think it is related to any manipulation of the syslog sourcetype that splunk does. Does anyone have an explanation for this?
I thought more about your statement "I'm pretty sure we haven't done any local defining of the sourcetype definition so it's all as per default."
Actually, Splunk does a fair amount of manipulation of the
syslog sourcetype by default. Other apps may as well. As I don't see anything wrong with either the events or the search, I think that is where I would look next.
As a test, what happens if you load some of this data but change the sourcetype in inputs.conf to something else? Try this out on a test instance somewhere...
Every event in this search has an
application field. I have
application saved as a selected field so based on what I've seen with other searches and fields, it should show as a selected field regardless of how many events it appears in.
An example search that returns events like this but doesn't extract the application field is:
Thanks very much for your help.
Hmmm - does every event have an
application field and value? In the fields sidebar, only fields that appear in over 50% of the results will appear. If you go to "All fields", the pop-up only shows fields that appear in at least 1% of the results.
Also, it would be interesting to see the search that fails to return the
Sourcetype is syslog.
disabled = false
followTail = 0
host = sv121-mw4
sourcetype = syslog
blacklist = blacklist = .(gz|bz2|z|zip)$
I'm pretty sure we haven't done any local defining of the sourcetype definition so it's all as per default.