We have a log file with multiple lines of JSON similar to this:
{ "foo": "bar","foo1":"foo2","userEmail":"foo@bar.com"}
{ "foo": "bar","foo1":"foo2","userEmail":"foo1@bar.com"}
{ "foo": "bar","foo1":"foo2","userEmail":"foo2@bar.com"}
And search-time extraction works fine for almost all of the fields... except one! Oddly, around 7-8% of all logs do not have userEmail automatically extracted as checked in the Event Coverage, even when I've manually defined it in props.conf. This was verified with the queries:
index=foo | search userEmail=*
index=foo | search NOT userEmail=*
Events are sent from a forwarder with this props.conf:
[foo]
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIME_FORMAT = %Y%m%d%H%M%S%3N
TIME_PREFIX = \"timestamp\":\"
TZ = UTC
KV_MODE = json
disabled = false
TRUNCATE = 0
I added these on the search head earlier today to force search-time extraction for userEmail, but didn't work, even when I verified the regex catches all emails in Splunk Web:
[foo]
EXTRACT-userEmail = "userEmail":"(?P<userEmail>[^"]+)
KV_MODE = json
Any idea why this might happen?
Turns out the issue was caused by an autolookup with an outdated CSV lookup file where userEmail was one of the autolookup fields. Updated the CSV fixed my issue.
In this case, it appears that all userEmails that did exist in the lookup table would autolookup and extract correctly, and userEmails that were not present in the CSV would fail autolookup and for some reason also broke field extraction.
Turns out the issue was caused by an autolookup with an outdated CSV lookup file where userEmail was one of the autolookup fields. Updated the CSV fixed my issue.
In this case, it appears that all userEmails that did exist in the lookup table would autolookup and extract correctly, and userEmails that were not present in the CSV would fail autolookup and for some reason also broke field extraction.
@zanglang If your problem is resolved, please accept an answer to help future readers.
Open a support case.
I'm experiencing the same thing.
I have JSON formatted data from the NetApp ONTAP add-on that contains a pool_id field. If I search the correct index and sourcetype and add " |extract" or "| spath", pool_id gets extracted correctly otherwise it extracts what appears to be all other fields except for this one. Scratching head...
So it turns out that my problem was caused by a FIELDALIAS setting that was setting a field that actually existed in the data with a field that didn't exist. If I had run btool as suggested by @wenthold, I would have found this much faster.
It turns out that if the props.conf add-on had used "ASNEW" instead of "AS" in the FIELDALIAS definition, Splunk would have kept the field extraction it found in the data rather than overwrite it with a field that didn't exist. The update I made to local/props.conf for that add-on was:
FIELDALIAS-array_id = uuid ASNEW array_id
Have you tried btool - if you're running on Linux I would try something like
$SPLUNK_HOME/bin/splunk btool props list --debug | grep "userEmail"
Field aliasing, calculated fields, and lookups are all performed at search time after KV extractions, maybe there's an errant config that's overwriting your field.