<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Is there a way to setup splunk to not create duplicate events when indexing a csv file overwritten with duplicate data? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Is-there-a-way-to-setup-splunk-to-not-create-duplicate-events/m-p/32550#M5808</link>
    <description>&lt;P&gt;By default Splunk will build a CRC from the first 256 bytes of a file, regardless of if it is a CSV or other.&lt;BR /&gt;
Is it possible that even though the contents aren't changing some other heading information is changing?&lt;/P&gt;

&lt;P&gt;If you have a look at - &lt;A href="http://docs.splunk.com/Documentation/Splunk/latest/admin/inputsconf"&gt;http://docs.splunk.com/Documentation/Splunk/latest/admin/inputsconf&lt;/A&gt;  You can edit the amount of bytes it builds the CRC from by changing initCrcLength.&lt;/P&gt;

&lt;P&gt;Otherwise you may be better to actually store the CSV in a lookups folder and search it as a lookup?&lt;BR /&gt;
This way you could build a dashboard that uses | inputlookup to pull in the CSV and then do a search against that for certain criteria.&lt;/P&gt;</description>
    <pubDate>Sat, 16 Feb 2013 08:49:59 GMT</pubDate>
    <dc:creator>Drainy</dc:creator>
    <dc:date>2013-02-16T08:49:59Z</dc:date>
    <item>
      <title>Is there a way to setup splunk to not create duplicate events when indexing a csv file overwritten with duplicate data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Is-there-a-way-to-setup-splunk-to-not-create-duplicate-events/m-p/32548#M5806</link>
      <description>&lt;P&gt;We have a process to identify, capture, and write high priority/urgent events to  a csv file that gets overwritten every time the process executes.  The contents may not change for days.  However, splunk is indexing the whole file every time the process runs--even if the contents of the file haven't changed.&lt;/P&gt;

&lt;P&gt;The program that creates the csv file calls an external vendor SOAP web service. I could add a whole bunch of logic to the program to persist a timestamp and use it as a filter for new service responses;  But we prefer to index data as received/logged.   I'm not sure what CRC values splunk is using to determine how to read the file.  The next step is to see if those values are available in a debug/log file.&lt;/P&gt;

&lt;P&gt;Has anybody ran into this and know of a solution?  ie I don't rule out user error.  I'm fairly new to Splunk.  &lt;/P&gt;

&lt;P&gt;Any help would be appreciated.&lt;/P&gt;</description>
      <pubDate>Thu, 14 Feb 2013 15:25:08 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Is-there-a-way-to-setup-splunk-to-not-create-duplicate-events/m-p/32548#M5806</guid>
      <dc:creator>dlovett</dc:creator>
      <dc:date>2013-02-14T15:25:08Z</dc:date>
    </item>
    <item>
      <title>Re: Is there a way to setup splunk to not create duplicate events when indexing a csv file overwritten with duplicate data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Is-there-a-way-to-setup-splunk-to-not-create-duplicate-events/m-p/32549#M5807</link>
      <description>&lt;P&gt;I'm also fairly new to splunk and have been searching for an answer to this problem for the last two days. I see several similar questions with no clear answer. I am also using csv files that are frequently overwritten with same or new data and each time splunk re-indexes the data creating duplicates in splunk. I'm getting the following in the splunkd.log:&lt;/P&gt;

&lt;P&gt;WatchedFile - Checksum for seekptr didn't match, will re-read entire file='D:\fd\myfile.csv'.&lt;/P&gt;

&lt;P&gt;Not sure if this is related. Any help is appreciated!&lt;/P&gt;</description>
      <pubDate>Fri, 15 Feb 2013 00:44:00 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Is-there-a-way-to-setup-splunk-to-not-create-duplicate-events/m-p/32549#M5807</guid>
      <dc:creator>handygecko</dc:creator>
      <dc:date>2013-02-15T00:44:00Z</dc:date>
    </item>
    <item>
      <title>Re: Is there a way to setup splunk to not create duplicate events when indexing a csv file overwritten with duplicate data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Is-there-a-way-to-setup-splunk-to-not-create-duplicate-events/m-p/32550#M5808</link>
      <description>&lt;P&gt;By default Splunk will build a CRC from the first 256 bytes of a file, regardless of if it is a CSV or other.&lt;BR /&gt;
Is it possible that even though the contents aren't changing some other heading information is changing?&lt;/P&gt;

&lt;P&gt;If you have a look at - &lt;A href="http://docs.splunk.com/Documentation/Splunk/latest/admin/inputsconf"&gt;http://docs.splunk.com/Documentation/Splunk/latest/admin/inputsconf&lt;/A&gt;  You can edit the amount of bytes it builds the CRC from by changing initCrcLength.&lt;/P&gt;

&lt;P&gt;Otherwise you may be better to actually store the CSV in a lookups folder and search it as a lookup?&lt;BR /&gt;
This way you could build a dashboard that uses | inputlookup to pull in the CSV and then do a search against that for certain criteria.&lt;/P&gt;</description>
      <pubDate>Sat, 16 Feb 2013 08:49:59 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Is-there-a-way-to-setup-splunk-to-not-create-duplicate-events/m-p/32550#M5808</guid>
      <dc:creator>Drainy</dc:creator>
      <dc:date>2013-02-16T08:49:59Z</dc:date>
    </item>
    <item>
      <title>Re: Is there a way to setup splunk to not create duplicate events when indexing a csv file overwritten with duplicate data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Is-there-a-way-to-setup-splunk-to-not-create-duplicate-events/m-p/32551#M5809</link>
      <description>&lt;P&gt;Problem solved.  The issue was a bug in the the program: it was opening and closing the file multiple times during the process.  I'm guessing this is why the CRC wasn't matching.  I refactored the code and all is well.  &lt;/P&gt;</description>
      <pubDate>Thu, 07 Mar 2013 14:12:28 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Is-there-a-way-to-setup-splunk-to-not-create-duplicate-events/m-p/32551#M5809</guid>
      <dc:creator>dlovett</dc:creator>
      <dc:date>2013-03-07T14:12:28Z</dc:date>
    </item>
  </channel>
</rss>

