[monitor:///var/log/application/active/*.log] disabled=0 sourcetype=application index=application [monitor:///var/log/application/rotated/*.log] disabled=0 sourcetype=application index=application
If I understand the CRC that Splunk calculates, when
is rotated to
the log events should not be duplicated because the first 256 bytes remained the same.
Except, my entire file is duplicated, with splund.log stating: Normal record was not found for initCrc=0xbd68c9187f8e7490.
Is this because it's in a different directory or a different inputs.conf stanza? I'm not using
initCrc=<SOURCE>, so I did not expect the directory to make a difference. Can anyone explain the detail I'm missing here?
One of my other test cases gave me the clue to the cause here. The log file is slightly cryptic, but my conclusion seems to make sense. I could not find documentation to confirm this though.
The warning here is that a file smaller than the 256 bytes, must not be rotated. If it is, the content will be re-indexed causing duplication. This is because the rotated file smaller than 256 bytes will have a different absolute file path and/or name, causing Splunk to think it's a new file.
Here, Splunk finds a new file, smaller than 256 bytes:
05-04-2018 16:33:30.426 +0000 DEBUG WatchedFile - Normal record was not found for initCrc=0x8d22bc7af0b12e35
05-04-2018 16:33:30.427 +0000 DEBUG WatchedFile - Reached EOF: fname=/var/log/cpauto/test1/test_compress1013.log fishstate=key=0x8d22bc7af0b12e35 sptr=156 scrc=0x5fa01acd024c2876 fnamecrc=0x8d22bc7af0b12e35 modtime=1525451610
Notice that the CRC used is not the fishstate key of 0x8d22bc7af0b12e35, but the file name CRC 0x8d22bc7af0b12e35.
If this file is rotated before it reaches 256, the file name will be different, this have a different CRC, causing Splunk to think it's a new file.
I was surprised to find, perhaps when I should not have been, that Splunk is extremely quick at reading files. In my tests I found Splunk to typically read a new file at least twice before it has even reached the init CRC minimum of 256 bytes. This means almost all files will start with a file name based CRC, and not the content based CRC, even if the first two log events written to the file are larger than 256 bytes. Probability of this being a problem is silly low. Except, perhaps, for applications that log next to nothing. Perhaps size-based rotation is your friend here.
My guess would be that it is because of the 2 stanzas.
Perhaps try combine them in 1 stanza:
[monitor:///var/log/application/(active|rotated)/*.log] disabled=0 sourcetype=application index=application