<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Duplicate records in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Duplicate-records/m-p/93477#M19439</link>
    <description>&lt;P&gt;Any help would be good! &lt;/P&gt;</description>
    <pubDate>Fri, 13 Jul 2012 08:35:00 GMT</pubDate>
    <dc:creator>jaterlwj</dc:creator>
    <dc:date>2012-07-13T08:35:00Z</dc:date>
    <item>
      <title>Duplicate records</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Duplicate-records/m-p/93474#M19436</link>
      <description>&lt;P&gt;I have tested and realized that when monitoring a file with let's say 24 rows with the option "Continuously index data from a file or directory this Splunk instance can access".&lt;BR /&gt;
I noticed that when I add a new row and refreshes. There are now 49 rows. The older 24 records are being duplicated. Is there any option to stop duplicate rows?&lt;/P&gt;

&lt;P&gt;Here are some specifics.&lt;BR /&gt;
File format: .log&lt;BR /&gt;
Specify the source:"Continuously index data from a file or directory this Splunk instance can access."&lt;/P&gt;

&lt;P&gt;set host: constant value&lt;BR /&gt;
set source type: manual&lt;BR /&gt;
destination index:default&lt;/P&gt;</description>
      <pubDate>Mon, 09 Jul 2012 05:31:34 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Duplicate-records/m-p/93474#M19436</guid>
      <dc:creator>jaterlwj</dc:creator>
      <dc:date>2012-07-09T05:31:34Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicate records</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Duplicate-records/m-p/93475#M19437</link>
      <description>&lt;P&gt;is your log file terminated with an end of file message, something like [END OF LOG FILE]?&lt;/P&gt;

&lt;P&gt;if so, this will confuse splunk. splunk uses the last 256 bytes for CRC. If you have a termination message that is constantly appended to your file, the CRC check will fail. When this happens, splunk rereads the file, thus duplicating records.&lt;/P&gt;

&lt;P&gt;See: &lt;A href="http://docs.splunk.com/Documentation/Splunk/latest/Data/Howlogfilerotationishandled#How_Splunk_recognizes_log_rotation"&gt;splunk log rotation&lt;/A&gt; &lt;/P&gt;</description>
      <pubDate>Mon, 09 Jul 2012 13:58:22 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Duplicate-records/m-p/93475#M19437</guid>
      <dc:creator>ak</dc:creator>
      <dc:date>2012-07-09T13:58:22Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicate records</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Duplicate-records/m-p/93476#M19438</link>
      <description>&lt;P&gt;Hi thank you for your reply! But the log file that I used does not contain any end of file message!&lt;/P&gt;</description>
      <pubDate>Tue, 10 Jul 2012 01:28:46 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Duplicate-records/m-p/93476#M19438</guid>
      <dc:creator>jaterlwj</dc:creator>
      <dc:date>2012-07-10T01:28:46Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicate records</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Duplicate-records/m-p/93477#M19439</link>
      <description>&lt;P&gt;Any help would be good! &lt;/P&gt;</description>
      <pubDate>Fri, 13 Jul 2012 08:35:00 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Duplicate-records/m-p/93477#M19439</guid>
      <dc:creator>jaterlwj</dc:creator>
      <dc:date>2012-07-13T08:35:00Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicate records</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Duplicate-records/m-p/93478#M19440</link>
      <description>&lt;P&gt;check the _internal index. it appears the whole file is being reread, thus 24 + 25 rows.&lt;/P&gt;</description>
      <pubDate>Fri, 13 Jul 2012 14:05:41 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Duplicate-records/m-p/93478#M19440</guid>
      <dc:creator>ak</dc:creator>
      <dc:date>2012-07-13T14:05:41Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicate records</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Duplicate-records/m-p/93479#M19441</link>
      <description>&lt;P&gt;Hi, Thanks for your reply.&lt;/P&gt;

&lt;P&gt;Pardon me for my ignorance, but what should I look for under the _internal index?  There's roughly 1.7m events in there. &lt;span class="lia-unicode-emoji" title=":face_with_open_mouth:"&gt;😮&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 16 Jul 2012 02:23:51 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Duplicate-records/m-p/93479#M19441</guid>
      <dc:creator>jaterlwj</dc:creator>
      <dc:date>2012-07-16T02:23:51Z</dc:date>
    </item>
  </channel>
</rss>

