<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Universal Forwarders and indexing in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Universal-Forwarders-and-indexing/m-p/59731#M11779</link>
    <description>&lt;P&gt;this ID is identical - there's just ONE RECORD ever &lt;/P&gt;

&lt;P&gt;The CSV gets replaced - which is why splunk indexes it twice - splunk indexes the same ID twice&lt;/P&gt;</description>
    <pubDate>Thu, 24 May 2012 17:50:06 GMT</pubDate>
    <dc:creator>asarolkar</dc:creator>
    <dc:date>2012-05-24T17:50:06Z</dc:date>
    <item>
      <title>Universal Forwarders and indexing</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Universal-Forwarders-and-indexing/m-p/59728#M11776</link>
      <description>&lt;P&gt;From a Windows box where the Universal Forwarder is installed, we're picking up a CSV extract (table.csv) every 24 hours.&lt;/P&gt;

&lt;P&gt;Each CSV has ONE UNIQUE row entry which contains an ID&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;This is the forwarder configuration:&lt;/P&gt;

&lt;P&gt;[monitor://D:\Splunk\Extract\table.csv]&lt;/P&gt;

&lt;P&gt;disabled=0&lt;/P&gt;

&lt;P&gt;followTail=0&lt;/P&gt;

&lt;P&gt;index=alpha&lt;/P&gt;

&lt;P&gt;sourcetype=alpha_sourcetype&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;Problem Statement&lt;/P&gt;

&lt;P&gt;We're getting duplicates IF the CSV extract is posited twice ( WE REPLACE the OLD extract with the NEW - but Splunk indexes both the OLD and the NEW because Splunk is always listening - and indexing)&lt;/P&gt;

&lt;P&gt;Splunk therefore logs two entries for the same record  (READ SAME ID) with BOTH timestamps - ONE FOR EACH TIME the CSV was replaced&lt;/P&gt;

&lt;P&gt;5/23/12   "2012-05-23 10:20:05.100000",&lt;BR /&gt;
5/23/12   "2012-05-23 10:19:05.100000",&lt;/P&gt;

&lt;P&gt;We only need the latest of these two.&lt;/P&gt;

&lt;P&gt;Is there any way to configure the forwarder so that everytime a new version of extract is posted - that it only ever indexes the LATEST copy - in a 24 hour time period ?&lt;/P&gt;

&lt;P&gt;If a forwarder cannot be configured in that way - how would we modify the following query to only pick the latest entry&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;index=alpha ID=* | sort ID&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;/BLOCKQUOTE&gt;</description>
      <pubDate>Thu, 24 May 2012 17:01:43 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Universal-Forwarders-and-indexing/m-p/59728#M11776</guid>
      <dc:creator>asarolkar</dc:creator>
      <dc:date>2012-05-24T17:01:43Z</dc:date>
    </item>
    <item>
      <title>Re: Universal Forwarders and indexing</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Universal-Forwarders-and-indexing/m-p/59729#M11777</link>
      <description>&lt;P&gt;can you clarify, this ID is identical between two versions of the extract, or does it change? also, i assume you're getting duplicates of the entire file, not just one record, is that correct?&lt;/P&gt;</description>
      <pubDate>Thu, 24 May 2012 17:38:10 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Universal-Forwarders-and-indexing/m-p/59729#M11777</guid>
      <dc:creator>gkanapathy</dc:creator>
      <dc:date>2012-05-24T17:38:10Z</dc:date>
    </item>
    <item>
      <title>Re: Universal Forwarders and indexing</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Universal-Forwarders-and-indexing/m-p/59730#M11778</link>
      <description>&lt;P&gt;Seems to me the only possible way to do this is to wait until the end of the 24 hour period to see if a new version shows up, since I assume it's impossible to know ahead of time if it is coming. This more or less defeats the point of having a forwarder monitor a file. If you're going to do that, write you own script and move the file into the batch directory once you've determined that it's safe to index it.&lt;/P&gt;

&lt;P&gt;I guess you can also just use:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| dedup ID
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;since that just returns the most recent entry for each ID, but you really haven't described your data enough to know if that actually would work.&lt;/P&gt;</description>
      <pubDate>Thu, 24 May 2012 17:43:34 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Universal-Forwarders-and-indexing/m-p/59730#M11778</guid>
      <dc:creator>gkanapathy</dc:creator>
      <dc:date>2012-05-24T17:43:34Z</dc:date>
    </item>
    <item>
      <title>Re: Universal Forwarders and indexing</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Universal-Forwarders-and-indexing/m-p/59731#M11779</link>
      <description>&lt;P&gt;this ID is identical - there's just ONE RECORD ever &lt;/P&gt;

&lt;P&gt;The CSV gets replaced - which is why splunk indexes it twice - splunk indexes the same ID twice&lt;/P&gt;</description>
      <pubDate>Thu, 24 May 2012 17:50:06 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Universal-Forwarders-and-indexing/m-p/59731#M11779</guid>
      <dc:creator>asarolkar</dc:creator>
      <dc:date>2012-05-24T17:50:06Z</dc:date>
    </item>
    <item>
      <title>Re: Universal Forwarders and indexing</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Universal-Forwarders-and-indexing/m-p/59732#M11780</link>
      <description>&lt;P&gt;Thanks for your answer ! &lt;/P&gt;

&lt;P&gt;We were hoping to not use dedup - but rather coerce splunk into giving us only the latest (or the appropriate term being LAST) set of timestamps for each record (read each ID) and ignore the FIRST or EARLIER timestamps it indexed&lt;/P&gt;</description>
      <pubDate>Thu, 24 May 2012 17:52:13 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Universal-Forwarders-and-indexing/m-p/59732#M11780</guid>
      <dc:creator>asarolkar</dc:creator>
      <dc:date>2012-05-24T17:52:13Z</dc:date>
    </item>
  </channel>
</rss>

