<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Will Splunk automatically remove duplicate data based on index time? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Will-Splunk-automatically-remove-duplicate-data-based-on-index/m-p/261254#M50135</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;I have to monitor the folder which has 1 time historic data in place. Now from another system we get the csv files sftp'd to Splunk instance every 15 mins only for updates and new inserts.  I have &lt;CODE&gt;dedup&lt;/CODE&gt; in place for the queries. So for any updates, will the old duplicate be removed automatically based on index time? Or should i have to incorporate anything specific to remove old records? For inserts I guess it should work fine. Experts,  Please guide me on this&lt;/P&gt;</description>
    <pubDate>Fri, 02 Dec 2016 09:33:00 GMT</pubDate>
    <dc:creator>k_harini</dc:creator>
    <dc:date>2016-12-02T09:33:00Z</dc:date>
    <item>
      <title>Will Splunk automatically remove duplicate data based on index time?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Will-Splunk-automatically-remove-duplicate-data-based-on-index/m-p/261254#M50135</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;I have to monitor the folder which has 1 time historic data in place. Now from another system we get the csv files sftp'd to Splunk instance every 15 mins only for updates and new inserts.  I have &lt;CODE&gt;dedup&lt;/CODE&gt; in place for the queries. So for any updates, will the old duplicate be removed automatically based on index time? Or should i have to incorporate anything specific to remove old records? For inserts I guess it should work fine. Experts,  Please guide me on this&lt;/P&gt;</description>
      <pubDate>Fri, 02 Dec 2016 09:33:00 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Will-Splunk-automatically-remove-duplicate-data-based-on-index/m-p/261254#M50135</guid>
      <dc:creator>k_harini</dc:creator>
      <dc:date>2016-12-02T09:33:00Z</dc:date>
    </item>
    <item>
      <title>Re: Will Splunk automatically remove duplicate data based on index time?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Will-Splunk-automatically-remove-duplicate-data-based-on-index/m-p/261255#M50136</link>
      <description>&lt;P&gt;So are you asking if Splunk has the ability to detect if the data has already been indexed from a log file already and ONLY index new data?&lt;/P&gt;

&lt;P&gt;If so then yes, Splunk will only forward new data which has NOT been indexed already without you having to run a dedup command. An example would be&lt;/P&gt;

&lt;P&gt;You have a log file that your monitoring.. That log file is currently 100MB and a forwarder has forwarded that 100MB of data already. Now a flurry of calls came in and that file grew to 110MB. The forwarder will only forward that new 10MB of data and recognize that the 100MB has already been forwarded and ignore it &lt;/P&gt;</description>
      <pubDate>Fri, 02 Dec 2016 12:50:51 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Will-Splunk-automatically-remove-duplicate-data-based-on-index/m-p/261255#M50136</guid>
      <dc:creator>skoelpin</dc:creator>
      <dc:date>2016-12-02T12:50:51Z</dc:date>
    </item>
    <item>
      <title>Re: Will Splunk automatically remove duplicate data based on index time?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Will-Splunk-automatically-remove-duplicate-data-based-on-index/m-p/261256#M50137</link>
      <description>&lt;P&gt;Actually for old records there might be some updates to it.. in the sense -- status might be changed from open to closed.. In that case old record should be deleted and only new should be retained.. For any new record inserted into database the record should be indexed.. I'm concerned about first case where old record should be removed.. I want to know if dedup will remove duplicate records based on index time.. &lt;/P&gt;</description>
      <pubDate>Sat, 03 Dec 2016 05:52:36 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Will-Splunk-automatically-remove-duplicate-data-based-on-index/m-p/261256#M50137</guid>
      <dc:creator>k_harini</dc:creator>
      <dc:date>2016-12-03T05:52:36Z</dc:date>
    </item>
    <item>
      <title>Re: Will Splunk automatically remove duplicate data based on index time?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Will-Splunk-automatically-remove-duplicate-data-based-on-index/m-p/261257#M50138</link>
      <description>&lt;P&gt;Dedup should give you most recent single record. However, this is an expensive command.&lt;/P&gt;

&lt;P&gt;Alternatively you can do the same by running stats on the data and pull latest(_time) latest(yourdatafield) and latest(satusfield) etc as per your need since dedup will be more expensive.&lt;/P&gt;

&lt;P&gt;Splunk has a delete command with which you can render older data unsearchable, however, be careful whether that is exactly what you require. Refer to Splunk documentation: &lt;A href="https://docs.splunk.com/Documentation/Splunk/6.5.1/SearchReference/Delete"&gt;https://docs.splunk.com/Documentation/Splunk/6.5.1/SearchReference/Delete&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 03 Dec 2016 06:41:47 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Will-Splunk-automatically-remove-duplicate-data-based-on-index/m-p/261257#M50138</guid>
      <dc:creator>niketn</dc:creator>
      <dc:date>2016-12-03T06:41:47Z</dc:date>
    </item>
    <item>
      <title>Re: Will Splunk automatically remove duplicate data based on index time?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Will-Splunk-automatically-remove-duplicate-data-based-on-index/m-p/261258#M50139</link>
      <description>&lt;P&gt;ok thanks a lot.. I will check on the delete functionality..  &lt;/P&gt;</description>
      <pubDate>Sat, 03 Dec 2016 06:48:46 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Will-Splunk-automatically-remove-duplicate-data-based-on-index/m-p/261258#M50139</guid>
      <dc:creator>k_harini</dc:creator>
      <dc:date>2016-12-03T06:48:46Z</dc:date>
    </item>
  </channel>
</rss>

