We have a Splunk instance set up on Windows to index Apache log files on a remote Windows machine over an UNC path. Something like \Server\ApacheLogs\access..log where the . part represents a timestamp for the daily rotated logs.
This generally works fine, but lately we started seeing issues with significant gaps due to missing data in our activity graphs.
Looking more into it to try to pin down the cause it would seem that it is related to these types of error messages in the splunk log:
07-07-2012 23:05:49.683 -0400 WARN FileInputTracker - Error reading CRC: The process cannot access the file because another process has locked a portion of the file.
07-07-2012 23:05:49.683 -0400 WARN WatchedFile - encountered error computing crc, hint: seekptr=18400131,startread=18399875,readsz=256
07-07-2012 23:05:49.683 -0400 ERROR TailingProcessor - Ignoring path due to: failed to compute crc for \Server\ApacheLogs\access_12-07-08.log (method: 0, hint: The process cannot access the file because another process has locked a portion of the file.)
Now.. this is to a fully normal situation as the log files will be locked on and off as Apache actually updates them. But of course we do not want this to cause Splunk to fail completely in getting the data (it does not look like a retry is performed, and parts of the file has already been indexed successfully).
Does anyone have ideas on how to avoid this conflict situation?
This behavior was considered a bug. Clasically Splunk tailing gave up for certain classes of file access errors and did not retry. This is the engineering principle of keeping things simple until you are certain the simple things work properly.
In 4.3.3 behavior on file access errors has been converted into an exponential backoff behavior. Thus 4.3.3 will probably report this problem, close the file, and retry it later (not looking at the log messages right now). So long as splunk eventually gets to monitor the file, you should eventually get the data.