I have enabled windows auditing on a windows machine and mounted the directory where all logs are written to on a Ubuntu machine where splunk i installed. I am then monitoring the mounted audit file from the splunk instance. The monitored file is in XML-format, the events are single-line and the last line in the XML-file is always
</Events> . Every new event is written before the last line so on the second last line.
The problem is that everytime new events are written to the monitored XML-file, Splunk re-indexes the entire file.
When i search for "index=internal sourcetype=splunkd component=watchedfile" I get the result "INFO WatchedFile - Checksum for seekptr didnt't match, will re-read the entire file=' /mnt/netappaudit/audit/auditsplunkaudit_last.xml'.
Other than that, the events are parsed correctly in Splunk.
Why is the entire file re-indexed everytime logs are written to the monitored XML-file?
Is it possible to get Splunk to only read events until the second last line?
When a new event is appended, are older events removed from the file as well? Or does the file keep growing?
Can you please edit your question and put the content of the last line between ` characters?
I guess the file actually ends with some xml tag, but due to how this splunk answers forum works, anything like
<bla> disappears when it is not posted in code tags.
If the last line contains a closing XML tag, which shifts down, that is why splunk fails on that. Not sure if there is any way to fix that. Really weird logging format to be honest, can this not be configured differently? What kind of logs are we actually talking about?
Yes it is a closing XML-tag '/Events'. I will look if there is any way to change the logging format. The logs are NetApp security auditing logs.
Yeah, so that messes things up. Splunk keeps track of the last line it has read and expects new logs to be added after that. In your case, that last line is the line with
</Events>. When a new event is added, it is added on that same line and the
</Events> is shifted down. So splunk detects that the last line it had already read is now changed and that triggers re-reading the entire file.
I don't see any settings in inputs.conf that can change that behavior, so I guess looking at the data source to see if it can log in a different way is your best bet.
I've upgraded my comment to an answer, so you can accept it.