I have a Python script configured as a data input that generates one JSON object per line containing events. This is how I configured props.conf for the source type:
[mysourcetype]
INDEXED_EXTRACTIONS = JSON
TIMESTAMP_FIELDS = date
TIME_FORMAT = %Y%m%d
TZ = UTC
detect_trailing_nulls = auto
SHOULD_LINEMERGE = false
description = My source type
pulldown_type = true
disabled = false
However, what is happening is as follows:
- Each event's _raw
contains a valid JSON object, as expected.
- Every field of the JSON object was extracted using it own name, as expected.
- The event timestamp is correctly set to the date contained in the date
field of the JSON object.
- Unexpectedly, all extracted fields are multi valued, with exactly two copies of the correct value present in the JSON object.
Funnily enough, if I use KV_MODE = JSON
instead of using INDEXED_EXTRACTIONS
with the same data everything works perfectly.
Any ideas on what might be going on?
Found it. Inspired by the comments and answer provided by @dsdb_splunkadmin and @fdi01 I found the problem was that I was enabling index time extractions (via INDEXED_EXTRACTIONS
) but not disabling search time extractions that happen by default (due to KV_MODE
and AUTO_KV_JSON
options). So both were occurring and generating duplicated extractions. 😞
This is what finally worked:
[mysourcetype]
INDEXED_EXTRACTIONS = JSON
TIMESTAMP_FIELDS = date
TIME_FORMAT = %Y%m%d
TZ = UTC
detect_trailing_nulls = auto
SHOULD_LINEMERGE = false
KV_MODE = none
AUTO_KV_JSON = false
Thanks everyone for their help.
Found it. Inspired by the comments and answer provided by @dsdb_splunkadmin and @fdi01 I found the problem was that I was enabling index time extractions (via INDEXED_EXTRACTIONS
) but not disabling search time extractions that happen by default (due to KV_MODE
and AUTO_KV_JSON
options). So both were occurring and generating duplicated extractions. 😞
This is what finally worked:
[mysourcetype]
INDEXED_EXTRACTIONS = JSON
TIMESTAMP_FIELDS = date
TIME_FORMAT = %Y%m%d
TZ = UTC
detect_trailing_nulls = auto
SHOULD_LINEMERGE = false
KV_MODE = none
AUTO_KV_JSON = false
Thanks everyone for their help.
For the above Accepted Answer, I would point out:
I put the above configuration in my etc/system/local/props.conf for my Universal Forwarder installation.
I also needed to ensure that on my Splunk Cloud Light instance, for the source type "mysourcetype", the following properties were set (under "Advanced"):
INDEXED_EXTRACTIONS = json
KV_MODE = none
In fact, the aforementioned two properties on the Splunk Cloud Light source type definition solved my duplication problem even without the addition of AUTO_KV_JSON on the forwarder side (had KV_MODE = none already in forwarder's config).
I am having similar issue, however i only see duplicates while looking running a raw search and expanding to look at all fields, however, when i print the field using table command, i dont see any duplicate value. Anyone aware of this behaviour and why is it happening?
try like this to see:
[monitor://<path to JSON>/*.JSON]
INDEXED_EXTRACTIONS = JSON
TIMESTAMP_FIELDS = date
TIME_FORMAT = %Y%m%d
TZ = UTC
detect_trailing_nulls = auto
SHOULD_LINEMERGE = false
description = JSON
pulldown_type = true
disabled = false
sourcetype = JSON
KV_MODE = JSON
index = name_your_index
disabled = false
crcSalt = <SOURCE>
if you no ok you can use the dedup command when you run search to elimite the duplicate values.
and use the mvexpand command to transforme the multi-valued fields
ex:
your_base_search_JSON| spath | eval temp=mvzip(college,mvzip(mark,studentname,"#"),"#") | mvexpand temp |......
Thank you for mentioning the dedup, it's a valid workaround. But I'd rather import the data correctly in the first place.
However, if you keep both INDEXED_EXTRACTIONS
and KV_MODE
set to JSON
I would expect to get duplicated values since Splunk would be extracting the fields both at index and at search time.
I too have this problem. Using Splunk Cloud, if I upload a JSON file with the following settings:
INDEXED_EXTRACTIONS = json
KV_MODE = none
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = true
TIMESTAMP_FIELDS = time
category = Structured
description = JavaScript Object Notation
disabled = false
pulldown_type = true
The data is imported correctly, no duplicate values. If I upload a file via a monitor on a Universal Forwarder with the following settings:
INDEXED_EXTRACTIONS = json
KV_MODE = none
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = true
TIMESTAMP_FIELDS = time
The value for each event is duplicated. If I change to have KV_MODE = json and reindex, it makes no difference for me, the values are still duplicated.
Interesting how you set KV_MODE = none
, it hadn't occurred to me to do that. Reading http://docs.splunk.com/Documentation/Splunk/6.2.2/admin/Propsconf I noticed that KV_MODE
defaults to auto
and more importantly that AUTO_KV_JSON
defaults to true
.
In that case, it would make sense that Splunk would extract the fields both during index time and during search time, thus duplicating the values.
So maybe if I add both KV_MODE = none
and AUTO_KV_JSON = false
to the original props.conf
file things will work as intended. I'll try this later, and if you could try it on you end as well we could confirm if that is the problem.
Unfortunately, having both KV_MODE=none and AUTO_KV_JSON=false together in my props.conf did not fix the issue for me.
I will do some tests to ensure the props.conf on the Universal Forwarder is definitely being applied.
Fixed it.
In my case, I had to make sure that on the Splunk Cloud instance the same sourcetype was defined and also had KV_MODE = none .
I had defined the type on my Universal Forwarder, but had not appreciated that some of the properties, like KV_MODE, are search time properties, and hence they would have to be defined on the search instance (not just the forwarded).
I didn't have to use the AUTO_KV_JSON = false setting in the end.
You put me on the right path though with the index vs search time double indexing - thanks!
Don't mention it. Actually thank you for guiding me to the right path by posting your example with KV_MODE = none
in the first place. 🙂
In case it's important, I'm using Splunk Universal Forwarder 6.2.2 (build 255606)