<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Why is my sourcetype configuration for JSON events with INDEXED_EXTRACTIONS making each extracted field multivalue with duplicate values? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Why-is-my-sourcetype-configuration-for-JSON-events-with-INDEXED/m-p/188562#M37606</link>
    <description>&lt;P&gt;For the above Accepted Answer, I would point out:&lt;BR /&gt;
I put the above configuration in my etc/system/local/props.conf for my Universal Forwarder installation. &lt;BR /&gt;
I also needed to ensure that on my Splunk Cloud Light instance, for the source type "mysourcetype", the following properties were set (under "Advanced"):&lt;/P&gt;

&lt;P&gt;INDEXED_EXTRACTIONS = json &lt;/P&gt;

&lt;P&gt;KV_MODE = none&lt;/P&gt;</description>
    <pubDate>Thu, 26 Oct 2017 20:04:09 GMT</pubDate>
    <dc:creator>cfoleydivert</dc:creator>
    <dc:date>2017-10-26T20:04:09Z</dc:date>
    <item>
      <title>Why is my sourcetype configuration for JSON events with INDEXED_EXTRACTIONS making each extracted field multivalue with duplicate values?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-is-my-sourcetype-configuration-for-JSON-events-with-INDEXED/m-p/188551#M37595</link>
      <description>&lt;P&gt;I have a Python script configured as a data input that generates one JSON object per line containing events. This is how I configured props.conf for the source type:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[mysourcetype]
INDEXED_EXTRACTIONS = JSON
TIMESTAMP_FIELDS = date
TIME_FORMAT = %Y%m%d
TZ = UTC
detect_trailing_nulls = auto
SHOULD_LINEMERGE = false
description = My source type
pulldown_type = true
disabled = false
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;However, what is happening is as follows:&lt;BR /&gt;
 - Each event's &lt;CODE&gt;_raw&lt;/CODE&gt; contains a valid JSON object, as expected.&lt;BR /&gt;
 - Every field of the JSON object was extracted using it own name, as expected.&lt;BR /&gt;
 - The event timestamp is correctly set to the date contained in the &lt;CODE&gt;date&lt;/CODE&gt; field of the JSON object.&lt;BR /&gt;
 - Unexpectedly, all extracted fields are multi valued, with exactly two copies of the correct value present in the JSON object.&lt;/P&gt;

&lt;P&gt;Funnily enough, if I use &lt;CODE&gt;KV_MODE = JSON&lt;/CODE&gt; instead of using &lt;CODE&gt;INDEXED_EXTRACTIONS&lt;/CODE&gt; with the same data everything works perfectly.&lt;/P&gt;

&lt;P&gt;Any ideas on what might be going on?&lt;/P&gt;</description>
      <pubDate>Wed, 18 Mar 2015 12:49:53 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-is-my-sourcetype-configuration-for-JSON-events-with-INDEXED/m-p/188551#M37595</guid>
      <dc:creator>asieira</dc:creator>
      <dc:date>2015-03-18T12:49:53Z</dc:date>
    </item>
    <item>
      <title>Re: Why is my sourcetype configuration for JSON events with INDEXED_EXTRACTIONS making each extracted field multivalue with duplicate values?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-is-my-sourcetype-configuration-for-JSON-events-with-INDEXED/m-p/188552#M37596</link>
      <description>&lt;P&gt;I too have this problem. Using Splunk Cloud, if I upload a JSON file with the following settings:&lt;/P&gt;

&lt;P&gt;INDEXED_EXTRACTIONS = json&lt;BR /&gt;
KV_MODE = none&lt;BR /&gt;
NO_BINARY_CHECK = true&lt;BR /&gt;
SHOULD_LINEMERGE = true&lt;BR /&gt;
TIMESTAMP_FIELDS = time&lt;BR /&gt;
category = Structured&lt;BR /&gt;
description = JavaScript Object Notation&lt;BR /&gt;
disabled = false&lt;BR /&gt;
pulldown_type = true&lt;/P&gt;

&lt;P&gt;The data is imported correctly, no duplicate values. If I upload a file via a monitor on a Universal Forwarder with the following settings:&lt;/P&gt;

&lt;P&gt;INDEXED_EXTRACTIONS = json&lt;BR /&gt;
KV_MODE = none&lt;BR /&gt;
NO_BINARY_CHECK = true&lt;BR /&gt;
SHOULD_LINEMERGE = true&lt;BR /&gt;
TIMESTAMP_FIELDS = time&lt;/P&gt;

&lt;P&gt;The value for each event is duplicated. If I change to have KV_MODE = json and reindex, it makes no difference for me, the values are still duplicated.&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 19:15:03 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-is-my-sourcetype-configuration-for-JSON-events-with-INDEXED/m-p/188552#M37596</guid>
      <dc:creator>dsdb_splunkadmi</dc:creator>
      <dc:date>2020-09-28T19:15:03Z</dc:date>
    </item>
    <item>
      <title>Re: Why is my sourcetype configuration for JSON events with INDEXED_EXTRACTIONS making each extracted field multivalue with duplicate values?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-is-my-sourcetype-configuration-for-JSON-events-with-INDEXED/m-p/188553#M37597</link>
      <description>&lt;P&gt;In case it's important, I'm using Splunk Universal Forwarder 6.2.2 (build 255606)&lt;/P&gt;</description>
      <pubDate>Thu, 19 Mar 2015 08:57:04 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-is-my-sourcetype-configuration-for-JSON-events-with-INDEXED/m-p/188553#M37597</guid>
      <dc:creator>dsdb_splunkadmi</dc:creator>
      <dc:date>2015-03-19T08:57:04Z</dc:date>
    </item>
    <item>
      <title>Re: Why is my sourcetype configuration for JSON events with INDEXED_EXTRACTIONS making each extracted field multivalue with duplicate values?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-is-my-sourcetype-configuration-for-JSON-events-with-INDEXED/m-p/188554#M37598</link>
      <description>&lt;P&gt;try like this to see:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[monitor://&amp;lt;path to JSON&amp;gt;/*.JSON]
INDEXED_EXTRACTIONS = JSON
TIMESTAMP_FIELDS = date
TIME_FORMAT = %Y%m%d
TZ = UTC
detect_trailing_nulls = auto
SHOULD_LINEMERGE = false
description = JSON
pulldown_type = true
disabled = false
 sourcetype = JSON
KV_MODE = JSON
index = name_your_index
disabled = false
crcSalt = &amp;lt;SOURCE&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;if you no ok you can use the &lt;STRONG&gt;dedup&lt;/STRONG&gt; command when you run search to elimite the  duplicate values.&lt;BR /&gt;
and use the &lt;STRONG&gt;mvexpand&lt;/STRONG&gt; command to transforme the  multi-valued  fields &lt;BR /&gt;
ex:   &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;your_base_search_JSON| spath | eval temp=mvzip(college,mvzip(mark,studentname,"#"),"#") | mvexpand temp |......
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 19 Mar 2015 13:52:58 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-is-my-sourcetype-configuration-for-JSON-events-with-INDEXED/m-p/188554#M37598</guid>
      <dc:creator>fdi01</dc:creator>
      <dc:date>2015-03-19T13:52:58Z</dc:date>
    </item>
    <item>
      <title>Re: Why is my sourcetype configuration for JSON events with INDEXED_EXTRACTIONS making each extracted field multivalue with duplicate values?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-is-my-sourcetype-configuration-for-JSON-events-with-INDEXED/m-p/188555#M37599</link>
      <description>&lt;P&gt;Thank you for mentioning the dedup, it's a valid workaround. But I'd rather import the data correctly in the first place.&lt;/P&gt;

&lt;P&gt;However, if you keep both &lt;CODE&gt;INDEXED_EXTRACTIONS&lt;/CODE&gt; and &lt;CODE&gt;KV_MODE&lt;/CODE&gt; set to &lt;CODE&gt;JSON&lt;/CODE&gt; I would &lt;EM&gt;expect&lt;/EM&gt; to get duplicated values since Splunk would be extracting the fields both at index and at search time.&lt;/P&gt;</description>
      <pubDate>Thu, 19 Mar 2015 14:01:01 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-is-my-sourcetype-configuration-for-JSON-events-with-INDEXED/m-p/188555#M37599</guid>
      <dc:creator>asieira</dc:creator>
      <dc:date>2015-03-19T14:01:01Z</dc:date>
    </item>
    <item>
      <title>Re: Why is my sourcetype configuration for JSON events with INDEXED_EXTRACTIONS making each extracted field multivalue with duplicate values?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-is-my-sourcetype-configuration-for-JSON-events-with-INDEXED/m-p/188556#M37600</link>
      <description>&lt;P&gt;Interesting how you set &lt;CODE&gt;KV_MODE = none&lt;/CODE&gt;, it hadn't occurred to me to do that. Reading &lt;A href="http://docs.splunk.com/Documentation/Splunk/6.2.2/admin/Propsconf"&gt;http://docs.splunk.com/Documentation/Splunk/6.2.2/admin/Propsconf&lt;/A&gt; I noticed that &lt;CODE&gt;KV_MODE&lt;/CODE&gt; defaults to &lt;CODE&gt;auto&lt;/CODE&gt; and more importantly that &lt;CODE&gt;AUTO_KV_JSON&lt;/CODE&gt; defaults to &lt;CODE&gt;true&lt;/CODE&gt;.&lt;/P&gt;

&lt;P&gt;In that case, it would make sense that Splunk would extract the fields both during index time &lt;EM&gt;and&lt;/EM&gt; during search time, thus duplicating the values.&lt;/P&gt;

&lt;P&gt;So maybe if I add both &lt;CODE&gt;KV_MODE = none&lt;/CODE&gt; &lt;EM&gt;and&lt;/EM&gt; &lt;CODE&gt;AUTO_KV_JSON = false&lt;/CODE&gt; to the original &lt;CODE&gt;props.conf&lt;/CODE&gt; file things will work as intended. I'll try this later, and if you could try it on you end as well we could confirm if that is the problem.&lt;/P&gt;</description>
      <pubDate>Thu, 19 Mar 2015 14:03:59 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-is-my-sourcetype-configuration-for-JSON-events-with-INDEXED/m-p/188556#M37600</guid>
      <dc:creator>asieira</dc:creator>
      <dc:date>2015-03-19T14:03:59Z</dc:date>
    </item>
    <item>
      <title>Re: Why is my sourcetype configuration for JSON events with INDEXED_EXTRACTIONS making each extracted field multivalue with duplicate values?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-is-my-sourcetype-configuration-for-JSON-events-with-INDEXED/m-p/188557#M37601</link>
      <description>&lt;P&gt;Found it. Inspired by the comments and answer provided by &lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/212740"&gt;@dsdb_splunkadmi&lt;/a&gt;n and &lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/114258"&gt;@fdi01&lt;/a&gt; I found the problem was that I was enabling index time extractions (via &lt;CODE&gt;INDEXED_EXTRACTIONS&lt;/CODE&gt;) but not disabling search time extractions that happen by default (due to &lt;CODE&gt;KV_MODE&lt;/CODE&gt; and &lt;CODE&gt;AUTO_KV_JSON&lt;/CODE&gt; options). So both were occurring and generating duplicated extractions. &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;This is what finally worked:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[mysourcetype]
INDEXED_EXTRACTIONS = JSON
TIMESTAMP_FIELDS = date
TIME_FORMAT = %Y%m%d
TZ = UTC
detect_trailing_nulls = auto
SHOULD_LINEMERGE = false
KV_MODE = none
AUTO_KV_JSON = false
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Thanks everyone for their help.&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 19:11:29 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-is-my-sourcetype-configuration-for-JSON-events-with-INDEXED/m-p/188557#M37601</guid>
      <dc:creator>asieira</dc:creator>
      <dc:date>2020-09-28T19:11:29Z</dc:date>
    </item>
    <item>
      <title>Re: Why is my sourcetype configuration for JSON events with INDEXED_EXTRACTIONS making each extracted field multivalue with duplicate values?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-is-my-sourcetype-configuration-for-JSON-events-with-INDEXED/m-p/188558#M37602</link>
      <description>&lt;P&gt;Unfortunately, having both KV_MODE=none and AUTO_KV_JSON=false together in my props.conf did not fix the issue for me.&lt;/P&gt;

&lt;P&gt;I will do some tests to ensure the props.conf on the Universal Forwarder is definitely being applied.&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 19:15:23 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-is-my-sourcetype-configuration-for-JSON-events-with-INDEXED/m-p/188558#M37602</guid>
      <dc:creator>dsdb_splunkadmi</dc:creator>
      <dc:date>2020-09-28T19:15:23Z</dc:date>
    </item>
    <item>
      <title>Re: Why is my sourcetype configuration for JSON events with INDEXED_EXTRACTIONS making each extracted field multivalue with duplicate values?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-is-my-sourcetype-configuration-for-JSON-events-with-INDEXED/m-p/188559#M37603</link>
      <description>&lt;P&gt;Fixed it.&lt;/P&gt;

&lt;P&gt;In my case, I had to make sure that on the Splunk Cloud instance the same sourcetype was defined and also had KV_MODE = none .&lt;/P&gt;

&lt;P&gt;I had defined the type on my Universal Forwarder, but had not appreciated that some of the properties, like KV_MODE, are search time properties, and hence they would have to be defined on the search instance (not just the forwarded).&lt;/P&gt;

&lt;P&gt;I didn't have to use the AUTO_KV_JSON = false setting in the end.&lt;/P&gt;

&lt;P&gt;You put me on the right path though with the index vs search time double indexing - thanks!&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 19:15:26 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-is-my-sourcetype-configuration-for-JSON-events-with-INDEXED/m-p/188559#M37603</guid>
      <dc:creator>dsdb_splunkadmi</dc:creator>
      <dc:date>2020-09-28T19:15:26Z</dc:date>
    </item>
    <item>
      <title>Re: Why is my sourcetype configuration for JSON events with INDEXED_EXTRACTIONS making each extracted field multivalue with duplicate values?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-is-my-sourcetype-configuration-for-JSON-events-with-INDEXED/m-p/188560#M37604</link>
      <description>&lt;P&gt;Don't mention it. Actually thank &lt;EM&gt;you&lt;/EM&gt; for guiding me to the right path by posting your example with &lt;CODE&gt;KV_MODE = none&lt;/CODE&gt; in the first place. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 19 Mar 2015 16:31:12 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-is-my-sourcetype-configuration-for-JSON-events-with-INDEXED/m-p/188560#M37604</guid>
      <dc:creator>asieira</dc:creator>
      <dc:date>2015-03-19T16:31:12Z</dc:date>
    </item>
    <item>
      <title>Re: Why is my sourcetype configuration for JSON events with INDEXED_EXTRACTIONS making each extracted field multivalue with duplicate values?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-is-my-sourcetype-configuration-for-JSON-events-with-INDEXED/m-p/188561#M37605</link>
      <description>&lt;P&gt;I am having similar issue, however i only see duplicates while looking running a raw search and expanding to look at all fields, however, when i print the field using table command, i dont see any duplicate value. Anyone aware of this behaviour and why is it happening?&lt;/P&gt;</description>
      <pubDate>Tue, 21 Jun 2016 19:07:56 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-is-my-sourcetype-configuration-for-JSON-events-with-INDEXED/m-p/188561#M37605</guid>
      <dc:creator>sanchitguptaiit</dc:creator>
      <dc:date>2016-06-21T19:07:56Z</dc:date>
    </item>
    <item>
      <title>Re: Why is my sourcetype configuration for JSON events with INDEXED_EXTRACTIONS making each extracted field multivalue with duplicate values?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-is-my-sourcetype-configuration-for-JSON-events-with-INDEXED/m-p/188562#M37606</link>
      <description>&lt;P&gt;For the above Accepted Answer, I would point out:&lt;BR /&gt;
I put the above configuration in my etc/system/local/props.conf for my Universal Forwarder installation. &lt;BR /&gt;
I also needed to ensure that on my Splunk Cloud Light instance, for the source type "mysourcetype", the following properties were set (under "Advanced"):&lt;/P&gt;

&lt;P&gt;INDEXED_EXTRACTIONS = json &lt;/P&gt;

&lt;P&gt;KV_MODE = none&lt;/P&gt;</description>
      <pubDate>Thu, 26 Oct 2017 20:04:09 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-is-my-sourcetype-configuration-for-JSON-events-with-INDEXED/m-p/188562#M37606</guid>
      <dc:creator>cfoleydivert</dc:creator>
      <dc:date>2017-10-26T20:04:09Z</dc:date>
    </item>
    <item>
      <title>Re: Why is my sourcetype configuration for JSON events with INDEXED_EXTRACTIONS making each extracted field multivalue with duplicate values?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-is-my-sourcetype-configuration-for-JSON-events-with-INDEXED/m-p/188563#M37607</link>
      <description>&lt;P&gt;In fact, the aforementioned two properties on the Splunk Cloud Light source type definition solved my duplication problem even without the addition of AUTO_KV_JSON on the forwarder side (had KV_MODE = none already in forwarder's config).&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 16:25:48 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-is-my-sourcetype-configuration-for-JSON-events-with-INDEXED/m-p/188563#M37607</guid>
      <dc:creator>cfoleydivert</dc:creator>
      <dc:date>2020-09-29T16:25:48Z</dc:date>
    </item>
    <item>
      <title>Re: Why is my sourcetype configuration for JSON events with INDEXED_EXTRACTIONS making each extracted field multivalue w</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Why-is-my-sourcetype-configuration-for-JSON-events-with-INDEXED/m-p/712699#M117709</link>
      <description>&lt;P&gt;&lt;SPAN&gt;I don't have admin rights in Splunk. Is there an easy way to enforces this in the search query?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 27 Feb 2025 14:33:41 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Why-is-my-sourcetype-configuration-for-JSON-events-with-INDEXED/m-p/712699#M117709</guid>
      <dc:creator>tozaltin</dc:creator>
      <dc:date>2025-02-27T14:33:41Z</dc:date>
    </item>
  </channel>
</rss>

