Archive

How to avoid double field extraction on a single indexed field?

Path Finder

We have the following config, which does index-time field extraction of job field, and search time field extraction of json events (KV_MODE=json).
fields.conf

[job]
 INDEXED=true

transforms.conf

 [my_job]
 REGEX = \"job\":\"(?<job>[^\"]+)\"
 FORMAT = job::$1
 WRITE_META = true

props.conf

 [my_json]
 KV_MODE = json
 NO_BINARY_CHECK = true
 SHOULD_LINEMERGE = false
 TIME_PREFIX = \"time\":\"
 TRANSFORMS-job = my_job
 disabled = false

Not surprisingly the job field (only) gets extracted twice, so a search with "... | table job other_field" gives results like this:

job     other_field
---     ------------
job1    other_value1
job1
job2    other_value2
job2

I have read here: http://docs.splunk.com/Documentation/Splunk/6.0/Data/Configureindex-timefieldextraction that since "a field of the same name is extracted at search time" we should set fields.conf INDEXED=false but this did not seem to help, even for events that were indexed after the change. Also the fields.conf/job setting is shared by other non-json source types that are working fine.

Any suggestions?

Tags (1)
0 Karma

Sorry in advance to resurrect this thread, but we had a similar issue. Setting AUTO_KV_JSON=false in the corresponding sourcetype stanza in the props.conf file on the search head resolved the issue.

0 Karma

Path Finder

Thanks for responding. There is no duplication of keys. The output described above (double values for "job" field, but not "other_field") can be see with data like this:

{"time":"2016-01-18T22:35:39.000Z","job":"job1","other_field":"other_value1"}
{"time":"2016-01-19T22:35:39.000Z","job":"job2","other_field":"other_value2"}

I think the problem is job field is extracted twice: once with our intentional index-time extraction (as shown in fields/transforms/props.conf), then again at search time with KV_MODE=json. The KV_MODE=json works great for all our other json fields, but is redundant for "job" field which has already been extracted.

0 Karma

Esteemed Legend

You are probably doing BOTH KV_MODE=JSON and INDEXED_EXTRACTIONS=JSON. Do only the latter.

0 Karma

Path Finder

Thanks for responding. We are not using INDEXED_EXTRACTIONS as we have large json events with many fields and we don't want all the fields indexed. But the problem is similar to the duplicate fields folks see when using both KV_MODE=json and INDEXED_EXTRACTIONS=json...

We do intentionally index the "job" field only. And this is the only field for which we see the duplicate fields, which makes sense since the "job" field is being extracted at index time and then again at search time (with KV_MODE=json). Of course we want KV_MODE=json for search time field extraction on the many other fields of the json event.

0 Karma

Contributor

I am having the exact same issue. I intentionally index two fields (out of 50) in my json event. At search time, the KV_MODE=json does search time extraction of the same field. Did you ever get an answer @rgsage?

0 Karma

Path Finder

Thanks for the bump. No we did not get a solution for this problem. Currently we just living with the double extraction 😞

0 Karma

SplunkTrust
SplunkTrust

I believe you have two job KvPs in each of your json events...

Like this:
{ "job" : { "job" : "1", "status": "good"}}

The first job has multiple values, the 2nd has a single value.

Please give us an example of a full JSON event (redact sensitive info), so that we may assist you further.

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!