Good day! We need to understand this. I don't have the permissions to test this. Any help is appreciated.
Scenario: Server A has light weight forwarder. It is sending apx. 1 million records per day to Splunk. The backups run nightly at 3:00 AM. The log file is backed up at 3:15 AM. The “fishbucket” is backed up at 3:30 AM. The server crashes at 5:00 PM. Sys admins rebuild the server and restore the files. Does this cause an option #2 or option #3 (shown below)?
Is it a different answer if: The log file is backed up at 3:45 AM. The “fishbucket” is backed up at 3:30 AM. The server crashes at 5:00 PM. Sys admins rebuild the server and restore the files. ??
If Splunk is started before the log file is restored, does that change the answer?
There is no begin and end CRC matching this file in the database. This indicates a new file. Splunk will pick it up and consume its data from the start of the file. Splunk updates the database with the new CRCs and seekPtrs as the file is being consumed.
The begin CRC and the end CRC are both present, but the size of the file is larger than the seekPtr Splunk stored. This means that, while Splunk has seen the file before, there has been data added to it since it was last read. Splunk opens the file, seeks to the previous end of the file, and starts reading from there. In this way, Splunk will only grab the new data and not anything it has read before.
The begin CRC is present, but the end CRC does not match. This means that Splunk has previously read the file but that some of the material that it read has since changed. In this case, Splunk must re-read the whole file.
Given the timing of your events (log file newer than fishbucket), you'd be in case #3. There would be an error message like this in your splunkd.log.
Checksum for seekptr didn't match, will re-read entire file='/path/to/files/server.log'