At search-time, several fields get extracted more than once, even if they only exist once in the event.
I know I can dedup the search, but this is fighting the symptom not solving the problem
The Question is, what config do I have to change to get this fixed?
Issue:
The fields "url" and "timestamp" show up twice with the same value in the search
timestamp = 2015-08-20T12:03:33Z timestamp = 2015-08-20T12:03:33Z
url = http://www.switch.ch/ url = http://www.switch.ch/
Partial Example Event, in the log it is in one line
{
<other stuff>
<other stuff>
<other stuff>
<other stuff>
<other stuff>
timestamp: 2015-08-20T12:03:33Z
<other stuff>
url: http://www.switch.ch/
<other stuff>
}
[sourcetype]
INDEXED_EXTRACTIONS = json
KV_MODE = json
MAX_TIMESTAMP_LOOKAHEAD = -1
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = timestamp
category = Structured
pulldown_type = true
Okay I think now I managed to fix it
INDEXED_EXTRACTIONS = json
KV_MODE = none
AUTO_KV_JSON = false
TIME_PREFIX = "timestamp"
MAX_TIMESTAMP_LOOKAHEAD = 50
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = timestamp
category = Structured
This seems to export and index the JSON fields at Index Time therefore no later processing needed
With the TIME_PREFIX I think i can reduce the lookahead
Thanks all
Okay I think now I managed to fix it
INDEXED_EXTRACTIONS = json
KV_MODE = none
AUTO_KV_JSON = false
TIME_PREFIX = "timestamp"
MAX_TIMESTAMP_LOOKAHEAD = 50
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = timestamp
category = Structured
This seems to export and index the JSON fields at Index Time therefore no later processing needed
With the TIME_PREFIX I think i can reduce the lookahead
Thanks all
I guess the problem could be with the field extraction you're doing. Based on your sourcetype definition, you're using both INDEXED_EXTRACTION (index time field extraction) and KV_MODE (search time field extraction). With this you get every field extracted twice. I would recommend to use search time field extraction, so try this for your sourcetype definition:-
[sourcetype]
KV_MODE = json
MAX_TIMESTAMP_LOOKAHEAD = -1
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = timestamp
category = Structured
pulldown_type = true
Could you please check the question having the same issue
Thank for your help
I think this topic i now found covers it better http://answers.splunk.com/answers/223095/why-is-my-sourcetype-configuration-for-json-events.html
The issue I created by
using
INDEXED_EXTRACTIONS = json
KV_MODE = json
Changing to
INDEXED_EXTRACTIONS = json
KV_MODE = none
AUTO_KV_JSON = false
Fixed it, but now I wonder if where I currently index all the json fields (which might cause quite some indexing) instead of only _time, source, host, sourcetype
I think its extracting ok, but Splunk has already done the timestamp extraction automatically on top of what you specified, hence duplicating. Could you please try..
# props.conf
[sourcetype]
NO_BINARY_CHECK = 1
TIME_PREFIX = "timestamp"
pulldown_type = 1
KV_MODE = JSON
# Sometimes below is required.
# BREAK_ONLY_BEFORE = (^{)
Okay I will try that ..
I also found the Time_PREFIX option
But I did not use it because it does not explain why the url gets extracted twice