<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Bug: Duplicate values with INDEXED_EXTRACTION? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Bug-Why-are-there-duplicate-values-with-INDEXED-EXTRACTION/m-p/528525#M89101</link>
    <description>&lt;P&gt;I have a JSON file with .json extension which has a complete one line unstructured json. any events gets added to the json array with the same one line json every 5 minutes.&lt;/P&gt;&lt;P&gt;Gone through multiple responses related to duplicate events for JSON, this is what my configurations looks both on search head and indexer props.conf , but still I can see duplicate events when searching on search head&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;[dell:boomi:atom]
LINE_BREAKER=(\},)
MUST_BREAK_AFTER=([\},])
SHOULD_LINEMERGE=false
SEDCMD-remove_header=s/({"jmx":\[)//g
SEDCMD-remove_footer=s/(}]})//g
INDEXED_EXTRACTIONS = JSON
KV_MODE = none
AUTO_KV_JSON = false
TIME_PREFIX={"(?=\d+-\d+-\d+T)
TIME_FORMAT=%Y-%m-%dT%H:%M:%S.%3N
MAX_TIMESTAMP_LOOKAHEAD=24
TRUNCATE = 0&lt;/PRE&gt;</description>
    <pubDate>Mon, 09 Nov 2020 12:47:47 GMT</pubDate>
    <dc:creator>divman</dc:creator>
    <dc:date>2020-11-09T12:47:47Z</dc:date>
    <item>
      <title>Bug: Why are there duplicate values with INDEXED_EXTRACTION?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Bug-Why-are-there-duplicate-values-with-INDEXED-EXTRACTION/m-p/500193#M85235</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;this is a long running issue with splunk creating duplicates as multi-value mv fields when JSON extraction runs at index time &lt;STRONG&gt;and&lt;/STRONG&gt; at search time. Especially in a distributed environment it can be mindboggling to find the right set of configurations to finally make it work. Can somebody please give us some details or documentation on how the whole extraction process works internally? I feel that we're all in the "try and error" state of mind and I'd really like to progress to the "knowing what actually happens so that we can cope" stage.&lt;/P&gt;
&lt;P&gt;Hint for development: Ideally splunk would be smart enough to realize that if a field already has been extracted at index time, there is no need to do it again. A simple &lt;EM&gt;if&lt;/EM&gt; clause in the code could make the whole configuration issue a lot simpler, speed up search time extractions and make apps in distributed environments more maintainable.&lt;/P&gt;
&lt;P&gt;Oliver&lt;/P&gt;</description>
      <pubDate>Thu, 19 Jan 2023 18:26:18 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Bug-Why-are-there-duplicate-values-with-INDEXED-EXTRACTION/m-p/500193#M85235</guid>
      <dc:creator>ololdach</dc:creator>
      <dc:date>2023-01-19T18:26:18Z</dc:date>
    </item>
    <item>
      <title>Re: Bug: Duplicate values with INDEXED_EXTRACTION?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Bug-Why-are-there-duplicate-values-with-INDEXED-EXTRACTION/m-p/500194#M85236</link>
      <description>&lt;P&gt;Hi ,&lt;/P&gt;

&lt;P&gt;This is a very good example as how SPLUNK is handling JSON data .&lt;/P&gt;

&lt;P&gt;&lt;A href="https://www.hurricanelabs.com/blog/splunk-case-study-indexed-extractions-vs-search-time-extractions" target="_blank"&gt;https://www.hurricanelabs.com/blog/splunk-case-study-indexed-extractions-vs-search-time-extractions&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;One thing to keep in mind, if you use INDEXED_EXTRACTIONS=json, then set KV_MODE=none. If you are not using INDEXED_EXTRACTIONS then use KV_MODE=json .&lt;/P&gt;

&lt;P&gt;Hope this is what you are looking for.&lt;/P&gt;</description>
      <pubDate>Wed, 30 Sep 2020 02:38:12 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Bug-Why-are-there-duplicate-values-with-INDEXED-EXTRACTION/m-p/500194#M85236</guid>
      <dc:creator>badrinath_itrs</dc:creator>
      <dc:date>2020-09-30T02:38:12Z</dc:date>
    </item>
    <item>
      <title>Re: Bug: Duplicate values with INDEXED_EXTRACTION?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Bug-Why-are-there-duplicate-values-with-INDEXED-EXTRACTION/m-p/500195#M85237</link>
      <description>&lt;P&gt;The Bug/Problem is: A user defined JSON sourcetype that has INDEXED_EXTRACTION=true will result in all fields being displayed as duplicate value mv fields when searched. This happens even if the KV_MODE is set to &lt;EM&gt;none&lt;/EM&gt; for this sourcetype.&lt;/P&gt;

&lt;P&gt;We did extensive testing to nail down this issue both on single-instance and distributed environments and it drove us mad, because the one config working here did not work in another seemingly identical environments. After a lot of research it boiled down to a simple visibility issue. Here are our lessons learned:&lt;BR /&gt;
1. The whole issue is caused by search time artifacts and only the search head configurations need to be changed&lt;BR /&gt;
2. The props.conf with the sourcetype definition including the KV_MODE=none have to be visible/accessible in the context of the search. &lt;BR /&gt;
3. When you define the sourcetype inside a TA, separated from the app that does the searches, you need to include the export=system in your local.meta file. In our case, inside the TA, we simply forgot to include export = system in the sourcetype's stanza in ./metadata/default.meta. Once we added the export setting, the duplicated values in our searches were gone.&lt;/P&gt;

&lt;P&gt;Still, we consider it a cludge that splunk does not realise that a JSON has already been extracted at index time, wasting additional time to re-extract it at search time. We hope that our ordeal helps others to save time on the subject.&lt;/P&gt;</description>
      <pubDate>Wed, 30 Sep 2020 02:38:15 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Bug-Why-are-there-duplicate-values-with-INDEXED-EXTRACTION/m-p/500195#M85237</guid>
      <dc:creator>ololdach</dc:creator>
      <dc:date>2020-09-30T02:38:15Z</dc:date>
    </item>
    <item>
      <title>Re: Bug: Duplicate values with INDEXED_EXTRACTION?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Bug-Why-are-there-duplicate-values-with-INDEXED-EXTRACTION/m-p/500196#M85238</link>
      <description>&lt;P&gt;Hi, thanks for your help. We finally got this one right and because there have been several people looking for hints on how to resolve this duplicate value issue, we decided to highlight our answer below.&lt;/P&gt;</description>
      <pubDate>Wed, 16 Oct 2019 12:16:51 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Bug-Why-are-there-duplicate-values-with-INDEXED-EXTRACTION/m-p/500196#M85238</guid>
      <dc:creator>ololdach</dc:creator>
      <dc:date>2019-10-16T12:16:51Z</dc:date>
    </item>
    <item>
      <title>Re: Bug: Duplicate values with INDEXED_EXTRACTION?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Bug-Why-are-there-duplicate-values-with-INDEXED-EXTRACTION/m-p/528524#M89100</link>
      <description>&lt;P&gt;I have a JSON file with .json extension which has a complete one line unstructured json. any events gets added to the json array with the same one line json every 5 minutes.&lt;/P&gt;&lt;P&gt;Gone through multiple responses related to duplicate events for JSON, this is what my configurations looks both on search head and indexer props.conf , but still I can see duplicate events when searching on search head&lt;/P&gt;&lt;PRE&gt;[dell:boomi:atom]
LINE_BREAKER=(\},)
MUST_BREAK_AFTER=([\},])
SHOULD_LINEMERGE=false
SEDCMD-remove_header=s/({"jmx":\[)//g
SEDCMD-remove_footer=s/(}]})//g
INDEXED_EXTRACTIONS = JSON
KV_MODE = none
AUTO_KV_JSON = false
TIME_PREFIX={"(?=\d+-\d+-\d+T)
TIME_FORMAT=%Y-%m-%dT%H:%M:%S.%3N
MAX_TIMESTAMP_LOOKAHEAD=24
TRUNCATE = 0&lt;/PRE&gt;</description>
      <pubDate>Mon, 09 Nov 2020 12:46:57 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Bug-Why-are-there-duplicate-values-with-INDEXED-EXTRACTION/m-p/528524#M89100</guid>
      <dc:creator>divman</dc:creator>
      <dc:date>2020-11-09T12:46:57Z</dc:date>
    </item>
    <item>
      <title>Re: Bug: Duplicate values with INDEXED_EXTRACTION?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Bug-Why-are-there-duplicate-values-with-INDEXED-EXTRACTION/m-p/528525#M89101</link>
      <description>&lt;P&gt;I have a JSON file with .json extension which has a complete one line unstructured json. any events gets added to the json array with the same one line json every 5 minutes.&lt;/P&gt;&lt;P&gt;Gone through multiple responses related to duplicate events for JSON, this is what my configurations looks both on search head and indexer props.conf , but still I can see duplicate events when searching on search head&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;[dell:boomi:atom]
LINE_BREAKER=(\},)
MUST_BREAK_AFTER=([\},])
SHOULD_LINEMERGE=false
SEDCMD-remove_header=s/({"jmx":\[)//g
SEDCMD-remove_footer=s/(}]})//g
INDEXED_EXTRACTIONS = JSON
KV_MODE = none
AUTO_KV_JSON = false
TIME_PREFIX={"(?=\d+-\d+-\d+T)
TIME_FORMAT=%Y-%m-%dT%H:%M:%S.%3N
MAX_TIMESTAMP_LOOKAHEAD=24
TRUNCATE = 0&lt;/PRE&gt;</description>
      <pubDate>Mon, 09 Nov 2020 12:47:47 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Bug-Why-are-there-duplicate-values-with-INDEXED-EXTRACTION/m-p/528525#M89101</guid>
      <dc:creator>divman</dc:creator>
      <dc:date>2020-11-09T12:47:47Z</dc:date>
    </item>
    <item>
      <title>Re: Bug: Duplicate values with INDEXED_EXTRACTION?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Bug-Why-are-there-duplicate-values-with-INDEXED-EXTRACTION/m-p/551003#M91486</link>
      <description>&lt;P&gt;I am facing duplicates though I have changed my local.meta file&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;[test_json]
AUTO_KV_JSON = false
INDEXED_EXTRACTIONS = json
KV_MODE = none&lt;/LI-CODE&gt;&lt;P&gt;local meta&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;[]
access = read : [ * ], write : [ * ]
export = system&lt;/LI-CODE&gt;&lt;P&gt;&lt;BR /&gt;Environment is distributed and I am ingesting this data from Search head. and sample data I tried with&amp;nbsp; is&amp;nbsp;&amp;nbsp;&lt;A href="https://jsonformatter.org/json-editor/a2ec9f" target="_blank"&gt;https://jsonformatter.org/json-editor/a2ec9f&lt;/A&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 09 May 2021 16:50:38 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Bug-Why-are-there-duplicate-values-with-INDEXED-EXTRACTION/m-p/551003#M91486</guid>
      <dc:creator>sanjeev543</dc:creator>
      <dc:date>2021-05-09T16:50:38Z</dc:date>
    </item>
    <item>
      <title>Re: Bug: Duplicate values with INDEXED_EXTRACTION?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Bug-Why-are-there-duplicate-values-with-INDEXED-EXTRACTION/m-p/627632#M107728</link>
      <description>&lt;P&gt;Thanks for solution, you're right, export=system was needed on SHC &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 19 Jan 2023 18:22:59 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Bug-Why-are-there-duplicate-values-with-INDEXED-EXTRACTION/m-p/627632#M107728</guid>
      <dc:creator>splunkreal</dc:creator>
      <dc:date>2023-01-19T18:22:59Z</dc:date>
    </item>
    <item>
      <title>Re: Bug: Duplicate values with INDEXED_EXTRACTION?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Bug-Why-are-there-duplicate-values-with-INDEXED-EXTRACTION/m-p/676784#M113189</link>
      <description>&lt;P&gt;Experiencing the same issue, can you advise where did you define this setting, please ? We`re using Splunk Cloud, so not sure how to access the local.meta file ?&lt;/P&gt;</description>
      <pubDate>Tue, 06 Feb 2024 23:37:52 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Bug-Why-are-there-duplicate-values-with-INDEXED-EXTRACTION/m-p/676784#M113189</guid>
      <dc:creator>tomapatan</dc:creator>
      <dc:date>2024-02-06T23:37:52Z</dc:date>
    </item>
    <item>
      <title>Re: Bug: Why are there duplicate values with INDEXED_EXTRACTION?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Bug-Why-are-there-duplicate-values-with-INDEXED-EXTRACTION/m-p/681464#M113859</link>
      <description>&lt;DIV class=""&gt;&lt;SPAN&gt;the&amp;nbsp;&lt;/SPAN&gt;INDEXED_EXTRACTIONS&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp;configuration belongs in props.conf of the universal forwarder.&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;|tstats count where index=* sourcetype=my_json_data by host | stats values(host)&lt;/LI-CODE&gt;&lt;P&gt;&lt;SPAN&gt;&lt;BR /&gt;The search above should tell you which hosts need to be looked at where you would remove INDEXED_EXTRACTIONS = json from the SHs and Indexers and move this configuration (INDEXED_EXTRACTIONS = json) to the forwarders props.conf.&lt;BR /&gt;&lt;BR /&gt;Make sure the forwarder inputs.conf for the json source you are ingesting is tagging the data with the appropriate sourcetype, then in props.conf reference that sourcetype stanza for your config:&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;ie (UF):&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;inputs.conf&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;[monitor:///file]
sourcetype=foo_json
index=bar&lt;/LI-CODE&gt;&lt;P&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;props.conf&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;[foo_json]
INDEXED_EXTRACTIONS = json&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;see:&lt;/SPAN&gt;&lt;A class="" href="https://docs.splunk.com/Documentation/Splunk/6.5.2/Admin/Configurationparametersandthedatapipeline?_gl=1*nufu65*_ga*MTIwNjQ4MTI1My4xNjkzODU5Nzk3*_ga_GS7YF8S63Y*MTcxMDk2OTA0MC45LjEuMTcxMDk2OTk2OC41My4wLjA.*_ga_5EPM2P39FV*MTcxMDk2NTMyMi4xNS4xLjE3MTA5Njk5ODUuMC4wLjA.&amp;amp;_ga=2.147263155.568450395.1710801981-1206481253.1693859797" target="_blank" rel="noopener noreferrer"&gt;https://docs.splunk.com/Documentation/Splunk/6.5.2/Admin/Configurationparametersandt[…]A.&amp;amp;_ga=2.147263155.568450395.1710801981-1206481253.1693859797&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;INDEXED_EXTRACTIONS are unique in that they happen in the structured parsing queue of the universal forwarder where usually parsing happens at a HF or indexer if there is no HF.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;if you use a HF as the first point of ingest and no UF then you place it there on the HF.&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;see:&amp;nbsp;&lt;/SPAN&gt;&lt;A class="" href="https://docs.splunk.com/Documentation/SplunkCloud/latest/Data/Extractfieldsfromfileswithstructureddata" target="_blank" rel="noopener noreferrer"&gt;https://docs.splunk.com/Documentation/SplunkCloud/latest/Data/Extractfieldsfromfileswithstructureddata&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;If you have Splunk Cloud Platform and want configure the extraction of fields from structured data, use the Splunk universal forwarder.&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;</description>
      <pubDate>Wed, 20 Mar 2024 22:06:59 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Bug-Why-are-there-duplicate-values-with-INDEXED-EXTRACTION/m-p/681464#M113859</guid>
      <dc:creator>rphillips_splk</dc:creator>
      <dc:date>2024-03-20T22:06:59Z</dc:date>
    </item>
    <item>
      <title>Re: Bug: Why are there duplicate values with INDEXED_EXTRACTION?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Bug-Why-are-there-duplicate-values-with-INDEXED-EXTRACTION/m-p/681563#M113879</link>
      <description>&lt;P&gt;We`re ingesting data using a REST API call, not a UF, but still experiencing the issue with duplicate values.&lt;/P&gt;
&lt;P&gt;We created an app using the Add-on Builder app then deployed it onto one of the HF which ingests and sends the data to Cloud.&lt;/P&gt;
&lt;P&gt;Settings on the HF:&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;KV_MODE = none
INDEXED_EXTRACTIONS = json &lt;/LI-CODE&gt;
&lt;P&gt;Any advice would be appreciated.&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;Toma&lt;/P&gt;</description>
      <pubDate>Thu, 21 Mar 2024 15:53:14 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Bug-Why-are-there-duplicate-values-with-INDEXED-EXTRACTION/m-p/681563#M113879</guid>
      <dc:creator>tomapatan</dc:creator>
      <dc:date>2024-03-21T15:53:14Z</dc:date>
    </item>
    <item>
      <title>Re: Bug: Why are there duplicate values with INDEXED_EXTRACTION?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Bug-Why-are-there-duplicate-values-with-INDEXED-EXTRACTION/m-p/681566#M113880</link>
      <description>&lt;P&gt;The KV_MODE (and AUTO_KV_JSON) are options needed on search-heads, not HFs/indexers.&lt;/P&gt;</description>
      <pubDate>Thu, 21 Mar 2024 15:39:36 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Bug-Why-are-there-duplicate-values-with-INDEXED-EXTRACTION/m-p/681566#M113880</guid>
      <dc:creator>PickleRick</dc:creator>
      <dc:date>2024-03-21T15:39:36Z</dc:date>
    </item>
    <item>
      <title>Re: Bug: Why are there duplicate values with INDEXED_EXTRACTION?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Bug-Why-are-there-duplicate-values-with-INDEXED-EXTRACTION/m-p/682006#M113952</link>
      <description>&lt;P&gt;Settings on the SH as follows:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;AUTO_KV_JSON = false
KV_MODE = none&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Settings on the HF:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;AUTO_KV_JSON = false
INDEXED_EXTRACTIONS = json
KV_MODE = none&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Values are getting duplicated, do you have anymore suggestions for us ?&lt;/P&gt;</description>
      <pubDate>Tue, 26 Mar 2024 14:05:13 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Bug-Why-are-there-duplicate-values-with-INDEXED-EXTRACTION/m-p/682006#M113952</guid>
      <dc:creator>tomapatan</dc:creator>
      <dc:date>2024-03-26T14:05:13Z</dc:date>
    </item>
  </channel>
</rss>

