<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Rotated log file to another directory causes duplication in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Rotated-log-file-to-another-directory-causes-duplication/m-p/375256#M67979</link>
    <description>&lt;P&gt;One of my other test cases gave me the clue to the cause here. The log file is slightly cryptic, but my conclusion seems to make sense. I could not find documentation to confirm this though.&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;TL;DR&lt;/STRONG&gt;&lt;BR /&gt;
The warning here is that a file smaller than the 256 bytes, must not be rotated. If it is, the content will be re-indexed causing duplication. This is because the rotated file smaller than 256 bytes will have a different absolute file path and/or name, causing Splunk to think it's a new file.&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;Logs&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;Here, Splunk finds a new file, smaller than 256 bytes:&lt;BR /&gt;
05-04-2018 16:33:30.426 +0000 DEBUG WatchedFile - Normal record was not found for initCrc=&lt;STRONG&gt;0x8d22bc7af0b12e35&lt;/STRONG&gt;&lt;BR /&gt;
05-04-2018 16:33:30.427 +0000 DEBUG WatchedFile - Reached EOF: fname=/var/log/cpauto/test1/test_compress1013.log fishstate=key=0x8d22bc7af0b12e35 sptr=156 scrc=0x5fa01acd024c2876 &lt;STRONG&gt;fnamecrc=0x8d22bc7af0b12e35&lt;/STRONG&gt; modtime=1525451610&lt;/P&gt;

&lt;P&gt;Notice that the CRC used is not the fishstate key of 0x8d22bc7af0b12e35, but the file name CRC 0x8d22bc7af0b12e35.&lt;/P&gt;

&lt;P&gt;If this file is rotated before it reaches 256, the file name will be different, this have a different CRC, causing Splunk to think it's a new file.&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;Comments&lt;/STRONG&gt;&lt;BR /&gt;
I was surprised to find, perhaps when I should not have been, that Splunk is extremely quick at reading files. In my tests I found Splunk to typically read a new file &lt;STRONG&gt;at least twice&lt;/STRONG&gt; before it has even reached the init CRC minimum of 256 bytes. This means almost all files will start with a file name based CRC, and not the content based CRC, even if the first two log events written to the file are larger than 256 bytes. Probability of this being a problem is silly low. Except, perhaps, for applications that log next to nothing. Perhaps size-based rotation is your friend here.&lt;/P&gt;</description>
    <pubDate>Fri, 04 May 2018 17:29:07 GMT</pubDate>
    <dc:creator>iamjvn</dc:creator>
    <dc:date>2018-05-04T17:29:07Z</dc:date>
    <item>
      <title>Rotated log file to another directory causes duplication</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Rotated-log-file-to-another-directory-causes-duplication/m-p/375251#M67974</link>
      <description>&lt;P&gt;&lt;STRONG&gt;Test inputs.conf&lt;/STRONG&gt;&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[monitor:///var/log/application/active/*.log]
disabled=0
sourcetype=application
index=application

[monitor:///var/log/application/rotated/*.log]
disabled=0
sourcetype=application
index=application
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;&lt;STRONG&gt;Expected result:&lt;/STRONG&gt;&lt;BR /&gt;
If I understand the CRC that Splunk calculates, when &lt;BR /&gt;
&lt;CODE&gt;/var/log/application/active/application.log&lt;/CODE&gt;&lt;BR /&gt;
is rotated to &lt;BR /&gt;
&lt;CODE&gt;/var/log/application/rotated/application.20171231.log&lt;/CODE&gt;&lt;BR /&gt;
the log events should not be duplicated because the first 256 bytes remained the same.&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;Actual result:&lt;/STRONG&gt;&lt;BR /&gt;
Except, my entire file is duplicated, with splund.log stating: Normal record was not found for initCrc=0xbd68c9187f8e7490.&lt;/P&gt;

&lt;P&gt;Is this because it's in a different directory or a different inputs.conf stanza? I'm not using &lt;CODE&gt;initCrc=&amp;lt;SOURCE&amp;gt;&lt;/CODE&gt;, so I did not expect the directory to make a difference. Can anyone explain the detail I'm missing here?&lt;/P&gt;</description>
      <pubDate>Thu, 03 May 2018 23:01:27 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Rotated-log-file-to-another-directory-causes-duplication/m-p/375251#M67974</guid>
      <dc:creator>iamjvn</dc:creator>
      <dc:date>2018-05-03T23:01:27Z</dc:date>
    </item>
    <item>
      <title>Re: Rotated log file to another directory causes duplication</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Rotated-log-file-to-another-directory-causes-duplication/m-p/375252#M67975</link>
      <description>&lt;P&gt;My question here would be, why do you monitor the rotated directory in the first place?&lt;/P&gt;</description>
      <pubDate>Thu, 03 May 2018 23:33:14 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Rotated-log-file-to-another-directory-causes-duplication/m-p/375252#M67975</guid>
      <dc:creator>MuS</dc:creator>
      <dc:date>2018-05-03T23:33:14Z</dc:date>
    </item>
    <item>
      <title>Re: Rotated log file to another directory causes duplication</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Rotated-log-file-to-another-directory-causes-duplication/m-p/375253#M67976</link>
      <description>&lt;P&gt;This is a test case to understand how Splunk monitoring (really) works.&lt;/P&gt;</description>
      <pubDate>Fri, 04 May 2018 00:04:41 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Rotated-log-file-to-another-directory-causes-duplication/m-p/375253#M67976</guid>
      <dc:creator>iamjvn</dc:creator>
      <dc:date>2018-05-04T00:04:41Z</dc:date>
    </item>
    <item>
      <title>Re: Rotated log file to another directory causes duplication</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Rotated-log-file-to-another-directory-causes-duplication/m-p/375254#M67977</link>
      <description>&lt;P&gt;To finish reading files that were rotated before Splunk had read all the way to the end of the file?&lt;/P&gt;</description>
      <pubDate>Fri, 04 May 2018 00:07:34 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Rotated-log-file-to-another-directory-causes-duplication/m-p/375254#M67977</guid>
      <dc:creator>FrankVl</dc:creator>
      <dc:date>2018-05-04T00:07:34Z</dc:date>
    </item>
    <item>
      <title>Re: Rotated log file to another directory causes duplication</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Rotated-log-file-to-another-directory-causes-duplication/m-p/375255#M67978</link>
      <description>&lt;P&gt;My guess would be that it is because of the 2 stanzas.&lt;/P&gt;

&lt;P&gt;Perhaps try combine them in 1 stanza:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[monitor:///var/log/application/(active|rotated)/*.log]
disabled=0
sourcetype=application
index=application
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 04 May 2018 00:10:48 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Rotated-log-file-to-another-directory-causes-duplication/m-p/375255#M67978</guid>
      <dc:creator>FrankVl</dc:creator>
      <dc:date>2018-05-04T00:10:48Z</dc:date>
    </item>
    <item>
      <title>Re: Rotated log file to another directory causes duplication</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Rotated-log-file-to-another-directory-causes-duplication/m-p/375256#M67979</link>
      <description>&lt;P&gt;One of my other test cases gave me the clue to the cause here. The log file is slightly cryptic, but my conclusion seems to make sense. I could not find documentation to confirm this though.&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;TL;DR&lt;/STRONG&gt;&lt;BR /&gt;
The warning here is that a file smaller than the 256 bytes, must not be rotated. If it is, the content will be re-indexed causing duplication. This is because the rotated file smaller than 256 bytes will have a different absolute file path and/or name, causing Splunk to think it's a new file.&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;Logs&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;Here, Splunk finds a new file, smaller than 256 bytes:&lt;BR /&gt;
05-04-2018 16:33:30.426 +0000 DEBUG WatchedFile - Normal record was not found for initCrc=&lt;STRONG&gt;0x8d22bc7af0b12e35&lt;/STRONG&gt;&lt;BR /&gt;
05-04-2018 16:33:30.427 +0000 DEBUG WatchedFile - Reached EOF: fname=/var/log/cpauto/test1/test_compress1013.log fishstate=key=0x8d22bc7af0b12e35 sptr=156 scrc=0x5fa01acd024c2876 &lt;STRONG&gt;fnamecrc=0x8d22bc7af0b12e35&lt;/STRONG&gt; modtime=1525451610&lt;/P&gt;

&lt;P&gt;Notice that the CRC used is not the fishstate key of 0x8d22bc7af0b12e35, but the file name CRC 0x8d22bc7af0b12e35.&lt;/P&gt;

&lt;P&gt;If this file is rotated before it reaches 256, the file name will be different, this have a different CRC, causing Splunk to think it's a new file.&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;Comments&lt;/STRONG&gt;&lt;BR /&gt;
I was surprised to find, perhaps when I should not have been, that Splunk is extremely quick at reading files. In my tests I found Splunk to typically read a new file &lt;STRONG&gt;at least twice&lt;/STRONG&gt; before it has even reached the init CRC minimum of 256 bytes. This means almost all files will start with a file name based CRC, and not the content based CRC, even if the first two log events written to the file are larger than 256 bytes. Probability of this being a problem is silly low. Except, perhaps, for applications that log next to nothing. Perhaps size-based rotation is your friend here.&lt;/P&gt;</description>
      <pubDate>Fri, 04 May 2018 17:29:07 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Rotated-log-file-to-another-directory-causes-duplication/m-p/375256#M67979</guid>
      <dc:creator>iamjvn</dc:creator>
      <dc:date>2018-05-04T17:29:07Z</dc:date>
    </item>
  </channel>
</rss>

