<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Reindex Duplicates / Reindex duplicate data in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Reindex-Duplicates-Reindex-duplicate-data/m-p/232275#M45274</link>
    <description>&lt;P&gt;HI Splunkers,&lt;/P&gt;

&lt;P&gt;I got a little complicated issue I cannot figure out. &lt;/P&gt;

&lt;P&gt;Everyday I receive a host file that we index that contains 80% of that data is duplicate from the days before (same timestamp everything). Splunk goes ahead and only indexes the non duplicates, which I would usually appreciate.&lt;/P&gt;

&lt;P&gt;However, for a specific report I need to find a setting where Splunk overwrites the old events and indexes a duplicate from the latest source.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;For example:

I get event 1111 - timestamp 1-17  on 1/17 in logfile1_17.txt

on 1-18 I have the event 1111 - timestamp 1-17 in logfile1_18.txt 

Now when I look for this event 1111 , I find as source logfile1_17.txt but I would need it indexed again and have source logfile1_18.txt
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Is there a setting that the event 1111 is shown in splunk with the latest indexed file as source or is indexed again ?&lt;/P&gt;

&lt;P&gt;Thanks.&lt;/P&gt;

&lt;P&gt;Oliver&lt;/P&gt;</description>
    <pubDate>Wed, 18 Jan 2017 14:34:30 GMT</pubDate>
    <dc:creator>omuelle1</dc:creator>
    <dc:date>2017-01-18T14:34:30Z</dc:date>
    <item>
      <title>Reindex Duplicates / Reindex duplicate data</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Reindex-Duplicates-Reindex-duplicate-data/m-p/232275#M45274</link>
      <description>&lt;P&gt;HI Splunkers,&lt;/P&gt;

&lt;P&gt;I got a little complicated issue I cannot figure out. &lt;/P&gt;

&lt;P&gt;Everyday I receive a host file that we index that contains 80% of that data is duplicate from the days before (same timestamp everything). Splunk goes ahead and only indexes the non duplicates, which I would usually appreciate.&lt;/P&gt;

&lt;P&gt;However, for a specific report I need to find a setting where Splunk overwrites the old events and indexes a duplicate from the latest source.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;For example:

I get event 1111 - timestamp 1-17  on 1/17 in logfile1_17.txt

on 1-18 I have the event 1111 - timestamp 1-17 in logfile1_18.txt 

Now when I look for this event 1111 , I find as source logfile1_17.txt but I would need it indexed again and have source logfile1_18.txt
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Is there a setting that the event 1111 is shown in splunk with the latest indexed file as source or is indexed again ?&lt;/P&gt;

&lt;P&gt;Thanks.&lt;/P&gt;

&lt;P&gt;Oliver&lt;/P&gt;</description>
      <pubDate>Wed, 18 Jan 2017 14:34:30 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Reindex-Duplicates-Reindex-duplicate-data/m-p/232275#M45274</guid>
      <dc:creator>omuelle1</dc:creator>
      <dc:date>2017-01-18T14:34:30Z</dc:date>
    </item>
    <item>
      <title>Re: Reindex Duplicates / Reindex duplicate data</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Reindex-Duplicates-Reindex-duplicate-data/m-p/232276#M45275</link>
      <description>&lt;P&gt;If this is the behaviour you want then consider using batch instead of monitor. &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[batch://&amp;lt;path&amp;gt;]
disabled = false
move_policy = sinkhole
index = yourindex
sourcetype = somesourcetype
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;This will index any file put in that directory and delete it  once ingested (so keep that in mind that you may lose the file once indexed). However the move_policy set to sinkhole will reindex the same file if put in that folder with different timestamp instead of following the tail. &lt;/P&gt;</description>
      <pubDate>Thu, 19 Jan 2017 04:40:36 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Reindex-Duplicates-Reindex-duplicate-data/m-p/232276#M45275</guid>
      <dc:creator>nabeel652</dc:creator>
      <dc:date>2017-01-19T04:40:36Z</dc:date>
    </item>
    <item>
      <title>Re: Reindex Duplicates / Reindex duplicate data</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Reindex-Duplicates-Reindex-duplicate-data/m-p/232277#M45276</link>
      <description>&lt;P&gt;Thank you, I went crcSalt =  which also seems to reindex duplicates.&lt;/P&gt;</description>
      <pubDate>Thu, 19 Jan 2017 15:30:40 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Reindex-Duplicates-Reindex-duplicate-data/m-p/232277#M45276</guid>
      <dc:creator>omuelle1</dc:creator>
      <dc:date>2017-01-19T15:30:40Z</dc:date>
    </item>
    <item>
      <title>Re: Reindex Duplicates / Reindex duplicate data</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Reindex-Duplicates-Reindex-duplicate-data/m-p/232278#M45277</link>
      <description>&lt;P&gt;Would this duplicate existing data? Would the summary indices be automatically updated?&lt;/P&gt;</description>
      <pubDate>Tue, 21 Mar 2017 15:46:33 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Reindex-Duplicates-Reindex-duplicate-data/m-p/232278#M45277</guid>
      <dc:creator>gurugv</dc:creator>
      <dc:date>2017-03-21T15:46:33Z</dc:date>
    </item>
  </channel>
</rss>

