I have a problem finding answers about the failure of a universal forwarder to re-ingest an XML file.
02-08-2023 11:11:40.348 +0000 INFO WatchedFile [10392 tailreader0] - Checksum for seekptr didn't match, will re-read entire file='ps_Z00000ldpowf9tXp9iZcoMZgvijew.log'.
This is an XML file. It is created as a small file. Eventually, an application will re-write this file with a temporary name before renaming it to this same name. This can be seconds after it is created or after many minutes or even hours.
My problem is that this event suggests that the forwarder knows that the file has changed but the new content of the file is not ingested.
It will be ingested as expected if I manually modify the top of the file later. At that point, I see:
02-08-2023 16:21:51.439 +0000 INFO WatchedFile [10392 tailreader0] - Checksum for seekptr didn't match, will re-read entire file='ps_Z00000ldpowf9tXp9iZcoMZgvijew.log'. 02-08-2023 16:21:51.439 +0000 INFO WatchedFile [10392 tailreader0] - Will begin reading at offset=0 for file='ps_Z00000ldpowf9tXp9iZcoMZgvijew.log'.
And the new version of the file is finally available.
This is a universal forwarder.
This is a Linux server.
The new version of the XML file is 2233 bytes long. The length on the file does not seem to be a problem.
A transform exists on the indexers to load the content as one event. This works fine.
I do not believe my problem is related to initCrcLength as it did notice the file has changed.
I blacklist the name of the temporary file.
Switching "multiline_event_extra_waittime” true or false does not help.
The ingestion and re-ingestion works fine most of the times. Maybe one every 20 files do not get re-ingested as expected. And it is usually the ones that are re-written few seconds after it got created.
My question is the following: why is the file sometimes not re-indexed if the forwarder says it will do it?
I can see that there can be a timing/race condition at play but the logs do not show anything other than the INFO records. Would changing the debugging level help? What other parameter in the input could help if this is a timing problem?
I failed finding a solution online because pretty much all conversations related to this INFO message are about stopping the file re-ingestion. So I have not been successful in finding my needle.