I've been lurking most of the topics related to the re-indexing of log files and Splunk creating duplicate entries. I've been experiencing the same issues for the past couple of months and have yet to resolve this duplication issue. What I'm really after is understanding what the
WatchedFile - Checksum for seekptr didn't match, will re-read entire file='access.log' WatchedFile - Will begin reading at offset=0 for file='access.log'
log entry means in splunkd.log. Does anyone have insight into what is creating the info message and why it has to re-index the entire log file? Here is a snippet of my inputs.conf stanza.
# monitor access logs from all nodes (node1-4) [monitor:///path/to/logs/node*/access.log] index = my_index sourcetype = access_common crcSalt = <source>
The other weird issue I notice is the re-indexing is only happening on the logs from node2-4, it does not re-index the log from node1, which is where the SplunkForwarder resides.
Typically it means that your file, in this case, access.log, rolled over. Splunk sees that there's a new file called access.log, but it looks different from the old file that it had been monitoring. It assumes that your log rolled and so it starts reading it from the beginning.
Did anyone find a solution to this? We've stumbled upon the same problem it seems.
If the file has a header, this would be the other way around, right? That Splunk wouldn't read new files because it think they are the same as the old rolled ones. So that the file has a header couldn't be the problem.
Did this ever get answered? The log is rolling over and I'm getting this WatchedFile - Checksum for seekptr didn't match, will re-read entire file='/path/to/logs/node*/abc.log'
So the issue is what's described above by mloven on Sep 16. How do I resolve it?
Mike737 and I are having the same issue with IIS log files (see the related Splunk question http://bit.ly/1a7ULmz) - they are not rolling over, yet Splunk is thinking they must be (I guess) and is re-reading them, creating duplicate entries... Any suggestions are appreciated and will be tested... 🙂
How big are your access logs? How old is the oldest data in those logs? Are the logs actually rolling over?
Splunk is detecting that the first 256 bytes of the file (by default) are different than the last time it tried to read the file. This commonly happens when a file rolls over. The "old" file becomes something like access.log.1, and your app starts logging to access.log. Splunk checks the file, sees that the first 256 bytes of the file are different, and basically assumes that your file rolled over. Therefore it starts over from the beginning of the file.
Here's where it gets tricky. My forwarder is doing this every minute to the access logs. Since the server logs are constantly being accessed for writing entries, I dont believe Splunk is detecting that as a roll over. Or is it? What exactly does Splunk do to check whether a log file has rolled over? i.e. from stat command - Access Time, Modified Time, Change Time, Inode change, etc.