Splunk Search

Why are several JSON fields getting extracted more than once at search-time?

mathiask
Communicator

At search-time, several fields get extracted more than once, even if they only exist once in the event.
I know I can dedup the search, but this is fighting the symptom not solving the problem
The Question is, what config do I have to change to get this fixed?

Issue:
The fields "url" and "timestamp" show up twice with the same value in the search
timestamp = 2015-08-20T12:03:33Z timestamp = 2015-08-20T12:03:33Z
url = http://www.switch.ch/ url = http://www.switch.ch/

Partial Example Event, in the log it is in one line
{
<other stuff>
<other stuff>
<other stuff>
<other stuff>
<other stuff>
timestamp: 2015-08-20T12:03:33Z
<other stuff>
url: http://www.switch.ch/
<other stuff>
}

[sourcetype]
INDEXED_EXTRACTIONS = json
KV_MODE = json
MAX_TIMESTAMP_LOOKAHEAD = -1
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = timestamp
category = Structured
pulldown_type = true

0 Karma
1 Solution

mathiask
Communicator

Okay I think now I managed to fix it

INDEXED_EXTRACTIONS = json
KV_MODE = none
AUTO_KV_JSON = false
TIME_PREFIX = "timestamp"
MAX_TIMESTAMP_LOOKAHEAD = 50
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = timestamp
category = Structured

This seems to export and index the JSON fields at Index Time therefore no later processing needed
With the TIME_PREFIX I think i can reduce the lookahead

Thanks all

View solution in original post

mathiask
Communicator

Okay I think now I managed to fix it

INDEXED_EXTRACTIONS = json
KV_MODE = none
AUTO_KV_JSON = false
TIME_PREFIX = "timestamp"
MAX_TIMESTAMP_LOOKAHEAD = 50
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = timestamp
category = Structured

This seems to export and index the JSON fields at Index Time therefore no later processing needed
With the TIME_PREFIX I think i can reduce the lookahead

Thanks all

View solution in original post

somesoni2
Revered Legend

I guess the problem could be with the field extraction you're doing. Based on your sourcetype definition, you're using both INDEXED_EXTRACTION (index time field extraction) and KV_MODE (search time field extraction). With this you get every field extracted twice. I would recommend to use search time field extraction, so try this for your sourcetype definition:-

[sourcetype]
KV_MODE = json
MAX_TIMESTAMP_LOOKAHEAD = -1
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = timestamp
category = Structured
pulldown_type = true

nawazns5038
Builder
0 Karma

mathiask
Communicator

Thank for your help
I think this topic i now found covers it better http://answers.splunk.com/answers/223095/why-is-my-sourcetype-configuration-for-json-events.html
The issue I created by
using

INDEXED_EXTRACTIONS = json
KV_MODE = json

Changing to

INDEXED_EXTRACTIONS = json
KV_MODE = none
AUTO_KV_JSON = false

Fixed it, but now I wonder if where I currently index all the json fields (which might cause quite some indexing) instead of only _time, source, host, sourcetype

0 Karma

koshyk
Super Champion

I think its extracting ok, but Splunk has already done the timestamp extraction automatically on top of what you specified, hence duplicating. Could you please try..

# props.conf   
[sourcetype]
NO_BINARY_CHECK = 1
TIME_PREFIX = "timestamp"
pulldown_type = 1
KV_MODE = JSON
# Sometimes below is required.
# BREAK_ONLY_BEFORE = (^{)
0 Karma

mathiask
Communicator

Okay I will try that ..
I also found the Time_PREFIX option
But I did not use it because it does not explain why the url gets extracted twice

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.