Even though the official docs are not very explicit on this aspect, TailingProcessor (for plain log files) and ArchiveProcessor (for archived logs) have slightly different implementations:
both TailingProcessor and ArchiveProcessor will have the same behaviour if the CRC check validation outcome results is a "no match" -> new file. The log will be read and ingested by splunk in full.
if the CRC check validation outcome is "match" then we have a difference between the 2 components:
TailingProcessor will check also if the the size has changed and new events been added. If that is the case, then the new events will be read and ingested as well.
ArchiveProcessor, once the CRC check validation outcome is confirmed to be a "match" (already known file), will assume that the file is "old" and already ingested and will just skip any other processing.
For the time being these are the various possibilities I currently see to move this forward (I have tried to summarise them in 3 macro categories):
A. sourcelog management logic change, for example:
A.1. instead of archive the older logs, simply rotate or rename them as plain log file so that the ArchiveProcessor component is avoided.
A.2. purely read archive files and not plain log files, so that all archive files are always seen as "new" and read and ingested in full by the ArchiveProcessor, with no change of missing events.
A.3. extend the log management logic, so that it becomes aware of an ongoing splunk outage, also from a temporal perspective, so that, for example, archives being generated during a splunk outage, are being extracted once again immediately after spunk is back available, so that the TailingProcessor can finish off reading all new events which had been ignored/skipped by the ArchiveProcessor earlier.
A.4. I am sure that there are other variants of things which could be done in order to change the sourcelog management on the UF servers in question.
B. a custom input implementation, for example a custom scripted or a modular input.
C. a Splunk Idea (new portal for ERs and FRs) to be raised for future evaluation/implementation based on the PM prioritisation.
... View more