<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Remove Multiline Log Duplicates in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Remove-Multiline-Log-Duplicates/m-p/106177#M22351</link>
    <description>&lt;P&gt;Rather than have splunk index the ftp'd file, you could perhaps have a script running after each ftp to extract just unique events into a new file and have splunk monitor that.&lt;/P&gt;</description>
    <pubDate>Thu, 26 Sep 2013 02:29:40 GMT</pubDate>
    <dc:creator>BenAveling</dc:creator>
    <dc:date>2013-09-26T02:29:40Z</dc:date>
    <item>
      <title>Remove Multiline Log Duplicates</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Remove-Multiline-Log-Duplicates/m-p/106176#M22350</link>
      <description>&lt;P&gt;I am trying to figure out an approach to a multiline log file problem I have, the device that generates the file does so like a regular running log file however it is FIFO at the point that it reaches 10MB. The only way I can get this file is via FTP and have Splunk monitor the download path, I managed to get all my multiline event breaking working correctly for the most part aside from a few stray events that are truncated from the source but I can live with that.  The issue I have is that if I simply overwrite the file with a newly downloaded copy it duplicates many events since the first 256 bytes of the file has a different CRC than before and so does the last 256 bytes of the file. It's really much of the middle portion that is potentially the same .&lt;/P&gt;

&lt;P&gt;Is there any tweak or method anyone can suggest to deal with this situation with the goal of not indexing any duplicate events?  &lt;/P&gt;

&lt;P&gt;First FTP of File Example&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;MSCi      MSS01                     2010-12-08  09:43:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:44:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:45:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:46:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:47:09
40+ random lines
END OF REPORT
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Second FTP of File ~1 hour later&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;MSCi      MSS01                     2010-12-08  10:43:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:47:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:46:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:45:09
40+ random lines
END OF REPORT

MSCi      MSS01                     2010-12-08  09:44:09
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Thanks&lt;/P&gt;

&lt;P&gt;Jerrad&lt;/P&gt;</description>
      <pubDate>Fri, 17 Dec 2010 12:29:34 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Remove-Multiline-Log-Duplicates/m-p/106176#M22350</guid>
      <dc:creator>jerrad</dc:creator>
      <dc:date>2010-12-17T12:29:34Z</dc:date>
    </item>
    <item>
      <title>Re: Remove Multiline Log Duplicates</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Remove-Multiline-Log-Duplicates/m-p/106177#M22351</link>
      <description>&lt;P&gt;Rather than have splunk index the ftp'd file, you could perhaps have a script running after each ftp to extract just unique events into a new file and have splunk monitor that.&lt;/P&gt;</description>
      <pubDate>Thu, 26 Sep 2013 02:29:40 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Remove-Multiline-Log-Duplicates/m-p/106177#M22351</guid>
      <dc:creator>BenAveling</dc:creator>
      <dc:date>2013-09-26T02:29:40Z</dc:date>
    </item>
  </channel>
</rss>

