<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to handle a daily changing CSV file and avoid indexing duplicate events/rows? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/How-to-handle-a-daily-changing-CSV-file-and-avoid-indexing/m-p/211875#M41712</link>
    <description>&lt;P&gt;If below is your case:&lt;/P&gt;

&lt;P&gt;Sep,1,2015:&lt;BR /&gt;
                  If you have ten records.&lt;BR /&gt;
Sep,2,2015:&lt;BR /&gt;
                 If you have ten+7 records. (7 new records and 10 records regenerated and are same as the previous file had)&lt;BR /&gt;
Sep,3,2015:&lt;BR /&gt;
                 You have 25 records in total. (10 from Sep,1,2015. 7 from Sep,2,2015 and 8 new records from Sep,3,2015)&lt;/P&gt;

&lt;P&gt;Then, you can choose to monitor the file continuously (while indexing) and make sure you copy paste all the data from the new file into old file (if you want to do it manually). This way you don't have duplicate records.&lt;/P&gt;

&lt;P&gt;If your scenario is different, then just use &lt;CODE&gt;index=something sourcetype=csv source=path/filename.csv | dedup _raw | your analysis code&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;Hope this is helpful for you.&lt;/P&gt;</description>
    <pubDate>Wed, 02 Sep 2015 13:55:34 GMT</pubDate>
    <dc:creator>thirumalreddyb</dc:creator>
    <dc:date>2015-09-02T13:55:34Z</dc:date>
    <item>
      <title>How to handle a daily changing CSV file and avoid indexing duplicate events/rows?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-handle-a-daily-changing-CSV-file-and-avoid-indexing/m-p/211874#M41711</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;I have daily growing CSV file that I want to index. Just importing it every day would end up in a lot a duplicate events. I've read about the followTail option, but also that this option is not recommended. How I can avoid duplicate events? My first thought was to create a daily scheduled search to delete all "old" files and only keep the last indexed file, but I hope there is a better possibility.&lt;/P&gt;

&lt;P&gt;Cheers&lt;BR /&gt;
Heinz&lt;/P&gt;</description>
      <pubDate>Wed, 02 Sep 2015 10:04:36 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-handle-a-daily-changing-CSV-file-and-avoid-indexing/m-p/211874#M41711</guid>
      <dc:creator>HeinzWaescher</dc:creator>
      <dc:date>2015-09-02T10:04:36Z</dc:date>
    </item>
    <item>
      <title>Re: How to handle a daily changing CSV file and avoid indexing duplicate events/rows?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-handle-a-daily-changing-CSV-file-and-avoid-indexing/m-p/211875#M41712</link>
      <description>&lt;P&gt;If below is your case:&lt;/P&gt;

&lt;P&gt;Sep,1,2015:&lt;BR /&gt;
                  If you have ten records.&lt;BR /&gt;
Sep,2,2015:&lt;BR /&gt;
                 If you have ten+7 records. (7 new records and 10 records regenerated and are same as the previous file had)&lt;BR /&gt;
Sep,3,2015:&lt;BR /&gt;
                 You have 25 records in total. (10 from Sep,1,2015. 7 from Sep,2,2015 and 8 new records from Sep,3,2015)&lt;/P&gt;

&lt;P&gt;Then, you can choose to monitor the file continuously (while indexing) and make sure you copy paste all the data from the new file into old file (if you want to do it manually). This way you don't have duplicate records.&lt;/P&gt;

&lt;P&gt;If your scenario is different, then just use &lt;CODE&gt;index=something sourcetype=csv source=path/filename.csv | dedup _raw | your analysis code&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;Hope this is helpful for you.&lt;/P&gt;</description>
      <pubDate>Wed, 02 Sep 2015 13:55:34 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-handle-a-daily-changing-CSV-file-and-avoid-indexing/m-p/211875#M41712</guid>
      <dc:creator>thirumalreddyb</dc:creator>
      <dc:date>2015-09-02T13:55:34Z</dc:date>
    </item>
    <item>
      <title>Re: How to handle a daily changing CSV file and avoid indexing duplicate events/rows?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-handle-a-daily-changing-CSV-file-and-avoid-indexing/m-p/211876#M41713</link>
      <description>&lt;P&gt;Maybe you should consider looking at the kv store.  I believe it has an upsert capability through a RESTful interface.&lt;/P&gt;

&lt;P&gt;&lt;A href="http://docs.splunk.com/Documentation/Splunk/6.2.5/Admin/AboutKVstore"&gt;http://docs.splunk.com/Documentation/Splunk/6.2.5/Admin/AboutKVstore&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 02 Sep 2015 17:34:49 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-handle-a-daily-changing-CSV-file-and-avoid-indexing/m-p/211876#M41713</guid>
      <dc:creator>jaredlaney</dc:creator>
      <dc:date>2015-09-02T17:34:49Z</dc:date>
    </item>
    <item>
      <title>Re: How to handle a daily changing CSV file and avoid indexing duplicate events/rows?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-handle-a-daily-changing-CSV-file-and-avoid-indexing/m-p/211877#M41714</link>
      <description>&lt;P&gt;| dedup _raw is good a first workaround&lt;/P&gt;

&lt;P&gt;thanks!&lt;/P&gt;</description>
      <pubDate>Mon, 14 Sep 2015 08:33:46 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-handle-a-daily-changing-CSV-file-and-avoid-indexing/m-p/211877#M41714</guid>
      <dc:creator>HeinzWaescher</dc:creator>
      <dc:date>2015-09-14T08:33:46Z</dc:date>
    </item>
  </channel>
</rss>

