<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Using INDEXED_EXTRACTIONS=json produces duplicate values in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Using-INDEXED-EXTRACTIONS-json-produces-duplicate-values/m-p/407255#M72212</link>
    <description>&lt;P&gt;Before you ask, I have found at least 10 questions similar to this as well as two identical questions, both of which are unresolved.&lt;/P&gt;

&lt;P&gt;I have one sourcetype which extracts fields from a JSON properly.  Awesome, no problem.  I created a second sourcetype with the &lt;STRONG&gt;same&lt;/STRONG&gt; settings and &lt;EM&gt;all fields are extracted twice during a search&lt;/EM&gt;.  The only difference in the data is the first sourcetype has the JSON on a single line.  The second sourcetype has the JSON indented on multiple lines.  This results in a multi-value field (not a duplicate event.)&lt;/P&gt;

&lt;P&gt;I'm running v7.0.1 with forwarders.  I am a loss of what to even check next.  Suggestions???  &lt;/P&gt;

&lt;H2&gt;Thanks!&lt;/H2&gt;

&lt;H3&gt;FIRST (ORIGINAL-WORKS FINE)&lt;/H3&gt;

&lt;PRE&gt;&lt;CODE&gt;SHOULD_LINEMERGE = true
INDEXED_EXTRACTIONS = json
NO_BINARY_CHECK = true
CHARSET=UTF-8
KV_MODE = none
AUTO_KV_JSON = false
category=Structured
description=JavaScript Object...
disabled=false
pulldown_type=true
TIMESTAMP_FIELDS = timestamp
TIME_FORMAT=%Y-%m-%dT%H%M%S%Z
TRUNCATE=0
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;H3&gt;SECOND (EXTRACTS DUPLICATES)&lt;/H3&gt;

&lt;PRE&gt;&lt;CODE&gt;INDEXED_EXTRACTIONS = json
NO_BINARY_CHECK = true
CHARSET=UTF-8
KV_MODE = none
AUTO_KV_JSON = false
category=Structured
description=JavaScript Object...
disabled=false
pulldown_type=true
TIMESTAMP_FIELDS = timestamp
TIME_FORMAT=%Y-%m-%dT%H%M%S%Z
TRUNCATE=0
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;along with all combinations of &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;BREAK_ONLY_BEFORE_DATE = [true | false]
SHOULD_LINEMERGE = [true | false]
&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Tue, 22 May 2018 21:14:20 GMT</pubDate>
    <dc:creator>mgallacher</dc:creator>
    <dc:date>2018-05-22T21:14:20Z</dc:date>
    <item>
      <title>Using INDEXED_EXTRACTIONS=json produces duplicate values</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Using-INDEXED-EXTRACTIONS-json-produces-duplicate-values/m-p/407255#M72212</link>
      <description>&lt;P&gt;Before you ask, I have found at least 10 questions similar to this as well as two identical questions, both of which are unresolved.&lt;/P&gt;

&lt;P&gt;I have one sourcetype which extracts fields from a JSON properly.  Awesome, no problem.  I created a second sourcetype with the &lt;STRONG&gt;same&lt;/STRONG&gt; settings and &lt;EM&gt;all fields are extracted twice during a search&lt;/EM&gt;.  The only difference in the data is the first sourcetype has the JSON on a single line.  The second sourcetype has the JSON indented on multiple lines.  This results in a multi-value field (not a duplicate event.)&lt;/P&gt;

&lt;P&gt;I'm running v7.0.1 with forwarders.  I am a loss of what to even check next.  Suggestions???  &lt;/P&gt;

&lt;H2&gt;Thanks!&lt;/H2&gt;

&lt;H3&gt;FIRST (ORIGINAL-WORKS FINE)&lt;/H3&gt;

&lt;PRE&gt;&lt;CODE&gt;SHOULD_LINEMERGE = true
INDEXED_EXTRACTIONS = json
NO_BINARY_CHECK = true
CHARSET=UTF-8
KV_MODE = none
AUTO_KV_JSON = false
category=Structured
description=JavaScript Object...
disabled=false
pulldown_type=true
TIMESTAMP_FIELDS = timestamp
TIME_FORMAT=%Y-%m-%dT%H%M%S%Z
TRUNCATE=0
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;H3&gt;SECOND (EXTRACTS DUPLICATES)&lt;/H3&gt;

&lt;PRE&gt;&lt;CODE&gt;INDEXED_EXTRACTIONS = json
NO_BINARY_CHECK = true
CHARSET=UTF-8
KV_MODE = none
AUTO_KV_JSON = false
category=Structured
description=JavaScript Object...
disabled=false
pulldown_type=true
TIMESTAMP_FIELDS = timestamp
TIME_FORMAT=%Y-%m-%dT%H%M%S%Z
TRUNCATE=0
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;along with all combinations of &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;BREAK_ONLY_BEFORE_DATE = [true | false]
SHOULD_LINEMERGE = [true | false]
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 22 May 2018 21:14:20 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Using-INDEXED-EXTRACTIONS-json-produces-duplicate-values/m-p/407255#M72212</guid>
      <dc:creator>mgallacher</dc:creator>
      <dc:date>2018-05-22T21:14:20Z</dc:date>
    </item>
    <item>
      <title>Re: Using INDEXED_EXTRACTIONS=json produces duplicate values</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Using-INDEXED-EXTRACTIONS-json-produces-duplicate-values/m-p/407256#M72213</link>
      <description>&lt;P&gt;What’s the name of your two sourectypes ?&lt;BR /&gt;
Where have you deployed them? SH or forwarder?&lt;/P&gt;</description>
      <pubDate>Wed, 28 Aug 2019 03:13:32 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Using-INDEXED-EXTRACTIONS-json-produces-duplicate-values/m-p/407256#M72213</guid>
      <dc:creator>iparitosh</dc:creator>
      <dc:date>2019-08-28T03:13:32Z</dc:date>
    </item>
    <item>
      <title>Re: Using INDEXED_EXTRACTIONS=json produces duplicate values</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Using-INDEXED-EXTRACTIONS-json-produces-duplicate-values/m-p/407257#M72214</link>
      <description>&lt;P&gt;Having the same exact problem and I can't figure it out.&lt;/P&gt;</description>
      <pubDate>Tue, 22 Oct 2019 22:05:58 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Using-INDEXED-EXTRACTIONS-json-produces-duplicate-values/m-p/407257#M72214</guid>
      <dc:creator>mstrozyk</dc:creator>
      <dc:date>2019-10-22T22:05:58Z</dc:date>
    </item>
  </channel>
</rss>

