We have some files that we're monitoring through a universal forwarder and we're seeing behaviors where as the file is rotated some of the rotations aren't being indexed at all. So for each file we get everything in the file, or nothing at all.
We're seeing messages regularly that line up with the rotations that the file is being recognized as rotated, even when we've confirmed that the file is missed. The log sample below seems to be consistent both when the the file is indexed, and when it isn't.
05-06-2015 19:24:31.097 -0500 INFO WatchedFile - Will begin reading at offset=0 for file='/var/log/infra/task_log.log'. 05-06-2015 19:24:31.097 -0500 INFO WatchedFile - Checksum for seekptr didn't match, will re-read entire file='/var/log/infra/task_log.log'. 05-06-2015 19:24:31.097 -0500 INFO BatchReader - Removed from queue file='/var/log/infra/task_log.log'.
These files have new content added to them at ~5minute intervals and rotate around once an hour, and we lose roughly 4 rotations worth of data a day. We also see regular messages lining with the extractions that we're exceeding the thruput maxKBps limit, which right now is at the default. From rough napkin math the total throughput for a day that we'd need is slightly above the maxKBps default.
What behavior should we be seeing with a file monitor when the maxKBps is exceeded? Should the forwards just hold their current seekptr until they've moved under some moving average? Or, as it seems to be working, are the current files finished in their entirety, and then a future reset of the file being dropped?
Also should we worry about the "Removed from queue" messages? They don't seem to be making a difference as we're seeing them for rotations that are being forwarded.
If you use a current version of the forwarder you should see the following INFO notice in splunkd.log
INFO ThruputProcessor - Current data throughput (262 kb/s) has reached maxKBps. As a result, data forwarding may be throttled. Consider increasing the value of maxKBps in limits.conf.
If the forwarder is throttled to the point that your log file rolls before Splunk has seen the events, and you have not configured your monitor input to monitor the rolled file, you will "lose" those events.
Sorry, I should have posted one of those log lines as well, they look similar to the one that you noted (with varying numbers that are all slightly more than the 256 kb/s defaults).
Currently our rolled files are gzipped and then deleted, so we would have "tasklog.log" (new and empty), and "tasklog.log.1.gz". The behavior that I'm seeing is that the file that was rolled into the gzip we see everything for, and the new task_log.log we get nothing for when the file is dropped.
So presuming that we're throttled constantly, what it seems is happening is that the file handle is kept open for the file that was getting rolled, until it indexes to EOF. We also see the messages in the original post that it recognized the rotation and updated the offset, but maybe 4 out of 24 rotations in the day we see no events from the file. I would have expected the behavior that it seems like you were pointing at, on rotation the monitor would advance to the new file and some portion at the end of the previous instances would be missing.