We have some customers indexing recovery data from a data outage. These files are 15-30 minutes of logging each. Up to several GB.
Thus far they have been using a standard monitor. But have been pulling files out of the monitor folder. They were "guessing" when Splunk was finished indexing instead of validating with event counts. I have checked, and some of the files were partially ingested.
I want to move them to a batch monitor, but I have questions;
Will these files be re-indexed fully, or will they resume based on CRC?
If a file has already been fully indexed with the standard monitor, will it be skipped if moved to the batch folder?
Is the CRC unique to each input, or can it be used for all inputs at any time?
If they will not resume, how would you suggest we remediate the issue without duplicate events?