Getting Data In

Rotated log file to another directory causes duplication

Explorer

Test inputs.conf

[monitor:///var/log/application/active/*.log]
disabled=0
sourcetype=application
index=application

[monitor:///var/log/application/rotated/*.log]
disabled=0
sourcetype=application
index=application

Expected result:
If I understand the CRC that Splunk calculates, when
/var/log/application/active/application.log
is rotated to
/var/log/application/rotated/application.20171231.log
the log events should not be duplicated because the first 256 bytes remained the same.

Actual result:
Except, my entire file is duplicated, with splund.log stating: Normal record was not found for initCrc=0xbd68c9187f8e7490.

Is this because it's in a different directory or a different inputs.conf stanza? I'm not using initCrc=<SOURCE>, so I did not expect the directory to make a difference. Can anyone explain the detail I'm missing here?

0 Karma

Explorer

One of my other test cases gave me the clue to the cause here. The log file is slightly cryptic, but my conclusion seems to make sense. I could not find documentation to confirm this though.

TL;DR
The warning here is that a file smaller than the 256 bytes, must not be rotated. If it is, the content will be re-indexed causing duplication. This is because the rotated file smaller than 256 bytes will have a different absolute file path and/or name, causing Splunk to think it's a new file.

Logs

Here, Splunk finds a new file, smaller than 256 bytes:
05-04-2018 16:33:30.426 +0000 DEBUG WatchedFile - Normal record was not found for initCrc=0x8d22bc7af0b12e35
05-04-2018 16:33:30.427 +0000 DEBUG WatchedFile - Reached EOF: fname=/var/log/cpauto/test1/test_compress1013.log fishstate=key=0x8d22bc7af0b12e35 sptr=156 scrc=0x5fa01acd024c2876 fnamecrc=0x8d22bc7af0b12e35 modtime=1525451610

Notice that the CRC used is not the fishstate key of 0x8d22bc7af0b12e35, but the file name CRC 0x8d22bc7af0b12e35.

If this file is rotated before it reaches 256, the file name will be different, this have a different CRC, causing Splunk to think it's a new file.

Comments
I was surprised to find, perhaps when I should not have been, that Splunk is extremely quick at reading files. In my tests I found Splunk to typically read a new file at least twice before it has even reached the init CRC minimum of 256 bytes. This means almost all files will start with a file name based CRC, and not the content based CRC, even if the first two log events written to the file are larger than 256 bytes. Probability of this being a problem is silly low. Except, perhaps, for applications that log next to nothing. Perhaps size-based rotation is your friend here.

0 Karma

Ultra Champion

My guess would be that it is because of the 2 stanzas.

Perhaps try combine them in 1 stanza:

[monitor:///var/log/application/(active|rotated)/*.log]
disabled=0
sourcetype=application
index=application
0 Karma

SplunkTrust
SplunkTrust

My question here would be, why do you monitor the rotated directory in the first place?

0 Karma

Ultra Champion

To finish reading files that were rotated before Splunk had read all the way to the end of the file?

0 Karma

Explorer

This is a test case to understand how Splunk monitoring (really) works.

0 Karma