I'm using Splunk 6.3.2 with a simple monitor stanza in inputs.conf that watches all the *.txt files in a particular directory.
The files in this directory are loaded using a cron job that runs every 10 minutes and uses LFTP to do a mirror from the remote server's log directory. The most recent log file grows throughout the current hour, and may be pulled several times before it stabilizes and the remote server moves to a new log file.
This sets things up for a failure, because it appears that LFTP truncates the current hour's file before repulling (it doesn't seem to pull to temp and then move to destination). We're getting duplicate log lines and re-indexing along with a WatchedFile warning that states:
File too small to check seekcrc, probably truncated. Will re-read entire file" followed by "Will begin reading at offset=0".
I'm hoping somebody knows of a quick trick here in either Splunk (a way to delay before doing CRC check) or LFTP (some way to force a smarter mirror or use of tmp/mv vs. truncate) that will force a more custom solution. Closest thing I've seen in Answers is folks pulling to another directory and then moving the files. I can certainly do that, but it will complicate things a bit with the desire to do a mirror (I may just have to burn 2x storage and keep two copies).
Looking forward to some been-there-done-that experience that resulted in a clean/efficient solution. Thank you all!
... View more