Solved: Why are my json fields extracted twice?

yannK · ‎10-22-2014

I have json events like : { A:"1",B:"2",C:"3"}
with a sourcetype named json_app

When I search the fields, I get 2 values , but not the source/sourcetype/index/_time
A =" 1 , 1" like multivalues

This started in splunk 6.1

yannK · ‎10-22-2014

Found the explanation, this is the new feature "INDEXED_EXTRACTIONS"
http://docs.splunk.com/Documentation/Splunk/6.1.4/Data/Extractfieldsfromfileheadersatindextime

This does the index time extraction of the fields, at parsing time, this means that the FORWARDERS are parsing the events if specified.
Remark : It also means that for those special events, the timestamp is now extracted by the forwarder, and that the filtering is done by the forwarders

example props.conf on the forwarder

[json_app]
INDEXED_EXTRACTIONS=json

My problem is that I also have a search-time automatic extraction of the fields for json data.

[json_app]
KV_MODE=json

So at the end I got 2 times the same fields.

To fix the issue, I simply disabled the KV_MODE on the search-head, and reloaded with the search command | extract reload=true

Here is my final props for my sourcetype.

[json_app]
INDEXED_EXTRACTIONS=json
KV_MODE=none

View solution in original post

yannK · ‎10-22-2014

Found the explanation, this is the new feature "INDEXED_EXTRACTIONS"
http://docs.splunk.com/Documentation/Splunk/6.1.4/Data/Extractfieldsfromfileheadersatindextime

This does the index time extraction of the fields, at parsing time, this means that the FORWARDERS are parsing the events if specified.
Remark : It also means that for those special events, the timestamp is now extracted by the forwarder, and that the filtering is done by the forwarders

example props.conf on the forwarder

[json_app]
INDEXED_EXTRACTIONS=json

My problem is that I also have a search-time automatic extraction of the fields for json data.

[json_app]
KV_MODE=json

So at the end I got 2 times the same fields.

To fix the issue, I simply disabled the KV_MODE on the search-head, and reloaded with the search command | extract reload=true

Here is my final props for my sourcetype.

[json_app]
INDEXED_EXTRACTIONS=json
KV_MODE=none

PatG_ · ‎09-23-2016

Thanks a lot!
I was getting crazy trying to understand why that was happening.

vinit_masaun · ‎01-25-2018

I have the same problem where I am getting duplicated field values from my json logs. I have a universal forwarder that sends data to a heavy forwarder, which then sends that data to indexers. I have the following props.conf in each layer where the INDEXED_EXTRACTIONS is set to 'json' at the universal forwarder and set to none at every other layer (heavy forwarder, indexer, and search head). I don't understand why the fields are still getting extracted twice:

universal forwarder props.conf:
INDEXED_EXTRACTIONS = json
KV_MODE = none
CHARSET = UTF-8
SHOULD_LINEMERGE = true
NO_BINARY_CHECK = true
TRUNCATE = 500000
pulldown_type = true
category = Structured
description = CAP - Ramp Document Monitoring
AUTO_KV_JSON = false

Heavy forwarder props.conf:
INDEXED_EXTRACTIONS = none
KV_MODE = none
CHARSET = UTF-8
SHOULD_LINEMERGE = true
NO_BINARY_CHECK = true
TRUNCATE = 500000
pulldown_type = true
category = Structured
description = CAP - Ramp Document Monitoring
AUTO_KV_JSON = false

Indexer props.conf:
INDEXED_EXTRACTIONS = none
KV_MODE = none
CHARSET = UTF-8
SHOULD_LINEMERGE = true
NO_BINARY_CHECK = true
TRUNCATE = 500000
pulldown_type = true
category = Structured
description = CAP - Ramp Document Monitoring
AUTO_KV_JSON = false

Search Head props.conf:
INDEXED_EXTRACTIONS = none
KV_MODE = none
CHARSET = UTF-8
SHOULD_LINEMERGE = true
NO_BINARY_CHECK = true
TRUNCATE = 500000
pulldown_type = true
category = Structured
description = CAP - Ramp Document Monitoring
AUTO_KV_JSON = false

Anam · ‎01-25-2018

Hi Vinit

It would be better if you post this as a new question, since this post is 3 years old and you might not get as much visibility on your question.

Thanks

laserval · ‎12-02-2014

Remark : It also means that for those special events, the timestamp is now extracted by the forwarder, and that the filtering is done by the forwarders

This sounds confusing - do you mean Light/Heavy forwarders or Universal Forwarders? The wiki (http://wiki.splunk.com/Where_do_I_configure_my_Splunk_settings%3F) does mention the same thing, that INDEXED_EXTRACTIONS is done in the input stage. This seems an unfortunate effect, have you seen any increased workload on the UF?

Why are my json fields extracted twice?

Harnessing Splunk’s Federated Search for Amazon S3

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

Enterprise Security Content Update (ESCU) | New Releases