Solved: Why are several JSON fields getting extracted more...

mathiask · ‎08-20-2015

At search-time, several fields get extracted more than once, even if they only exist once in the event.
I know I can dedup the search, but this is fighting the symptom not solving the problem
The Question is, what config do I have to change to get this fixed?

Issue:
The fields "url" and "timestamp" show up twice with the same value in the search
timestamp = 2015-08-20T12:03:33Z timestamp = 2015-08-20T12:03:33Z
url = http://www.switch.ch/ url = http://www.switch.ch/

Partial Example Event, in the log it is in one line
{
<other stuff>
<other stuff>
<other stuff>
<other stuff>
<other stuff>
timestamp: 2015-08-20T12:03:33Z
<other stuff>
url: http://www.switch.ch/
<other stuff>
}

[sourcetype]
INDEXED_EXTRACTIONS = json
KV_MODE = json
MAX_TIMESTAMP_LOOKAHEAD = -1
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = timestamp
category = Structured
pulldown_type = true

mathiask · ‎08-20-2015

Okay I think now I managed to fix it

INDEXED_EXTRACTIONS = json
KV_MODE = none
AUTO_KV_JSON = false
TIME_PREFIX = "timestamp"
MAX_TIMESTAMP_LOOKAHEAD = 50
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = timestamp
category = Structured

This seems to export and index the JSON fields at Index Time therefore no later processing needed
With the TIME_PREFIX I think i can reduce the lookahead

Thanks all

View solution in original post

mathiask · ‎08-20-2015

Okay I think now I managed to fix it

INDEXED_EXTRACTIONS = json
KV_MODE = none
AUTO_KV_JSON = false
TIME_PREFIX = "timestamp"
MAX_TIMESTAMP_LOOKAHEAD = 50
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = timestamp
category = Structured

This seems to export and index the JSON fields at Index Time therefore no later processing needed
With the TIME_PREFIX I think i can reduce the lookahead

Thanks all

somesoni2 · ‎08-20-2015

I guess the problem could be with the field extraction you're doing. Based on your sourcetype definition, you're using both INDEXED_EXTRACTION (index time field extraction) and KV_MODE (search time field extraction). With this you get every field extracted twice. I would recommend to use search time field extraction, so try this for your sourcetype definition:-

[sourcetype]
KV_MODE = json
MAX_TIMESTAMP_LOOKAHEAD = -1
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = timestamp
category = Structured
pulldown_type = true

nawazns5038 · ‎03-15-2018

Could you please check the question having the same issue

https://answers.splunk.com/answers/626871/double-field-extraction-for-the-json-data.html?minQuestion...

mathiask · ‎08-20-2015

Thank for your help
I think this topic i now found covers it better http://answers.splunk.com/answers/223095/why-is-my-sourcetype-configuration-for-json-events.html
The issue I created by
using

INDEXED_EXTRACTIONS = json
KV_MODE = json

Changing to

INDEXED_EXTRACTIONS = json
KV_MODE = none
AUTO_KV_JSON = false

Fixed it, but now I wonder if where I currently index all the json fields (which might cause quite some indexing) instead of only _time, source, host, sourcetype

koshyk · ‎08-20-2015

I think its extracting ok, but Splunk has already done the timestamp extraction automatically on top of what you specified, hence duplicating. Could you please try..

# props.conf   
[sourcetype]
NO_BINARY_CHECK = 1
TIME_PREFIX = "timestamp"
pulldown_type = 1
KV_MODE = JSON
# Sometimes below is required.
# BREAK_ONLY_BEFORE = (^{)

mathiask · ‎08-20-2015

Okay I will try that ..
I also found the Time_PREFIX option
But I did not use it because it does not explain why the url gets extracted twice

Why are several JSON fields getting extracted more than once at search-time?

Data Management Digest – December 2025

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Join the Conversation

Why are several JSON fields getting extracted more than once at search-time?

Data Management Digest – December 2025

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...