<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: md5, crcSalt, gzip, oh my! in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/md5-crcSalt-gzip-oh-my/m-p/38685#M7156</link>
    <description>&lt;P&gt;Helpful info, but won't actually do what I'm looking for.&lt;/P&gt;

&lt;P&gt;The problem with this proposed solution is that the timestamp of the gzip header is within the first 256 bytes of the file....so, this won't work &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 02 Nov 2012 21:59:43 GMT</pubDate>
    <dc:creator>stensonb</dc:creator>
    <dc:date>2012-11-02T21:59:43Z</dc:date>
    <item>
      <title>md5, crcSalt, gzip, oh my!</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/md5-crcSalt-gzip-oh-my/m-p/38683#M7154</link>
      <description>&lt;P&gt;Hello Splunkers -&lt;/P&gt;

&lt;P&gt;I'm having trouble figuring out how to make the following work.&lt;/P&gt;

&lt;OL&gt;
&lt;LI&gt;I get usage files from a popular CDN delivered to me via FTP.  These files come in gzipped...but, Splunk is nice and handles all of that wonderfully.&lt;/LI&gt;
&lt;LI&gt;However, on rare occasions, we may have the usage files redelivered from said CDN.  When they are redelivered, the contents of the gzip are identical, but the modified time of the gzip is different (bytes 9-12)...causing Splunk to re-index (which doubles my counts for that period)...which is bad.&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;So, I'm trying to get around splunk using the first/last 256 bytes to determine uniqueness...I'd like to use something like:&lt;/P&gt;

&lt;P&gt;CHECK_METHOD=none&lt;BR /&gt;
crcSalt = &lt;SOURCE&gt;&lt;/SOURCE&gt;&lt;/P&gt;

&lt;P&gt;...which would use filename as the ONLY factor when determining uniqueness, but "CHECK_METHOD=none" isn't an options.&lt;/P&gt;

&lt;P&gt;Can anybody suggest an alternative approach?&lt;/P&gt;</description>
      <pubDate>Fri, 27 Apr 2012 18:31:09 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/md5-crcSalt-gzip-oh-my/m-p/38683#M7154</guid>
      <dc:creator>stensonb</dc:creator>
      <dc:date>2012-04-27T18:31:09Z</dc:date>
    </item>
    <item>
      <title>Re: md5, crcSalt, gzip, oh my!</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/md5-crcSalt-gzip-oh-my/m-p/38684#M7155</link>
      <description>&lt;P&gt;update since 5.0 you can increase the length of the crc sample :&lt;BR /&gt;
see&lt;BR /&gt;
&lt;A href="http://docs.splunk.com/Documentation/Splunk/5.0/Admin/Inputsconf"&gt;http://docs.splunk.com/Documentation/Splunk/5.0/Admin/Inputsconf&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;&lt;BR /&gt;
initCrcLength = &amp;lt;integer&amp;gt;&lt;BR /&gt;
 * This setting adjusts how much of a file Splunk reads before trying to identify whether it is a file that has&lt;BR /&gt;
  already been seen.  You may want to adjust this if you have many files with common headers (comment headers,&lt;BR /&gt;
  long CSV headers, etc) and recurring filenames.&lt;BR /&gt;
 * CAUTION: Improper use of this setting will cause data to be reindexed.  You may wish to consult with Splunk&lt;BR /&gt;
  Support before adjusting this value - the default is fine for most installations.&lt;BR /&gt;
 * Defaults to 256 (bytes).&lt;BR /&gt;
 * Must be in the range 256-1048576.&lt;BR /&gt;
&lt;/CODE&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 02 Nov 2012 21:44:42 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/md5-crcSalt-gzip-oh-my/m-p/38684#M7155</guid>
      <dc:creator>yannK</dc:creator>
      <dc:date>2012-11-02T21:44:42Z</dc:date>
    </item>
    <item>
      <title>Re: md5, crcSalt, gzip, oh my!</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/md5-crcSalt-gzip-oh-my/m-p/38685#M7156</link>
      <description>&lt;P&gt;Helpful info, but won't actually do what I'm looking for.&lt;/P&gt;

&lt;P&gt;The problem with this proposed solution is that the timestamp of the gzip header is within the first 256 bytes of the file....so, this won't work &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 02 Nov 2012 21:59:43 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/md5-crcSalt-gzip-oh-my/m-p/38685#M7156</guid>
      <dc:creator>stensonb</dc:creator>
      <dc:date>2012-11-02T21:59:43Z</dc:date>
    </item>
  </channel>
</rss>

