Splunk Search

Why are my json fields extracted twice?

yannK
Splunk Employee
Splunk Employee

I have json events like : { A:"1",B:"2",C:"3"}
with a sourcetype named json_app

When I search the fields, I get 2 values , but not the source/sourcetype/index/_time
A =" 1 , 1" like multivalues

This started in splunk 6.1

Tags (2)
1 Solution

yannK
Splunk Employee
Splunk Employee

Found the explanation, this is the new feature "INDEXED_EXTRACTIONS"
http://docs.splunk.com/Documentation/Splunk/6.1.4/Data/Extractfieldsfromfileheadersatindextime

This does the index time extraction of the fields, at parsing time, this means that the FORWARDERS are parsing the events if specified.
Remark : It also means that for those special events, the timestamp is now extracted by the forwarder, and that the filtering is done by the forwarders

example props.conf on the forwarder

[json_app]
INDEXED_EXTRACTIONS=json   

My problem is that I also have a search-time automatic extraction of the fields for json data.

[json_app]
KV_MODE=json

So at the end I got 2 times the same fields.

To fix the issue, I simply disabled the KV_MODE on the search-head, and reloaded with the search command | extract reload=true

Here is my final props for my sourcetype.

[json_app]
INDEXED_EXTRACTIONS=json
KV_MODE=none

View solution in original post

yannK
Splunk Employee
Splunk Employee

Found the explanation, this is the new feature "INDEXED_EXTRACTIONS"
http://docs.splunk.com/Documentation/Splunk/6.1.4/Data/Extractfieldsfromfileheadersatindextime

This does the index time extraction of the fields, at parsing time, this means that the FORWARDERS are parsing the events if specified.
Remark : It also means that for those special events, the timestamp is now extracted by the forwarder, and that the filtering is done by the forwarders

example props.conf on the forwarder

[json_app]
INDEXED_EXTRACTIONS=json   

My problem is that I also have a search-time automatic extraction of the fields for json data.

[json_app]
KV_MODE=json

So at the end I got 2 times the same fields.

To fix the issue, I simply disabled the KV_MODE on the search-head, and reloaded with the search command | extract reload=true

Here is my final props for my sourcetype.

[json_app]
INDEXED_EXTRACTIONS=json
KV_MODE=none

PatG_
Engager

Thanks a lot!
I was getting crazy trying to understand why that was happening.

0 Karma

vinit_masaun
Explorer

I have the same problem where I am getting duplicated field values from my json logs. I have a universal forwarder that sends data to a heavy forwarder, which then sends that data to indexers. I have the following props.conf in each layer where the INDEXED_EXTRACTIONS is set to 'json' at the universal forwarder and set to none at every other layer (heavy forwarder, indexer, and search head). I don't understand why the fields are still getting extracted twice:

universal forwarder props.conf:
INDEXED_EXTRACTIONS = json
KV_MODE = none
CHARSET = UTF-8
SHOULD_LINEMERGE = true
NO_BINARY_CHECK = true
TRUNCATE = 500000
pulldown_type = true
category = Structured
description = CAP - Ramp Document Monitoring
AUTO_KV_JSON = false

Heavy forwarder props.conf:
INDEXED_EXTRACTIONS = none
KV_MODE = none
CHARSET = UTF-8
SHOULD_LINEMERGE = true
NO_BINARY_CHECK = true
TRUNCATE = 500000
pulldown_type = true
category = Structured
description = CAP - Ramp Document Monitoring
AUTO_KV_JSON = false

Indexer props.conf:
INDEXED_EXTRACTIONS = none
KV_MODE = none
CHARSET = UTF-8
SHOULD_LINEMERGE = true
NO_BINARY_CHECK = true
TRUNCATE = 500000
pulldown_type = true
category = Structured
description = CAP - Ramp Document Monitoring
AUTO_KV_JSON = false

Search Head props.conf:
INDEXED_EXTRACTIONS = none
KV_MODE = none
CHARSET = UTF-8
SHOULD_LINEMERGE = true
NO_BINARY_CHECK = true
TRUNCATE = 500000
pulldown_type = true
category = Structured
description = CAP - Ramp Document Monitoring
AUTO_KV_JSON = false

0 Karma

Anam
Community Manager
Community Manager

Hi Vinit

It would be better if you post this as a new question, since this post is 3 years old and you might not get as much visibility on your question.

Thanks

0 Karma

laserval
Communicator

Remark : It also means that for those special events, the timestamp is now extracted by the forwarder, and that the filtering is done by the forwarders

This sounds confusing - do you mean Light/Heavy forwarders or Universal Forwarders? The wiki (http://wiki.splunk.com/Where_do_I_configure_my_Splunk_settings%3F) does mention the same thing, that INDEXED_EXTRACTIONS is done in the input stage. This seems an unfortunate effect, have you seen any increased workload on the UF?

0 Karma
Get Updates on the Splunk Community!

Harnessing Splunk’s Federated Search for Amazon S3

Managing your data effectively often means balancing performance, costs, and compliance. Splunk’s Federated ...

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

We’ve been buzzing with excitement about the recent validation of Splunk Education! The 2024 Splunk Career ...

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...