I've been testing Splunk for several months now, and am consistently having problems with duplicate events appearing in the index. This sort of inconsistency is a show-stopper, so I decided to write a simple test case to investigate.
Our actual conditions are:
remote log file downloaded to /tmp via HTTP as a cron job
when downloading complete, log file in /tmp overwritten over the old log file
Splunk picks up the file and should only index new entries
However, throughout the day, we see a lot of duplicate entries, often in the magnitude of 10x the original number of log entries.
Experimenting, it seemed that Splunk might act differently depending on whether the log is being overwritten or appended to.
I ran a test script for about 12 hours, which continually added a new lines to two log files by either appending to a temp file and then overwriting to the monitored file, or directly appending to the monitored file.
Raw files:
$ wc -l ~/data/test/*
63987 /home/splunk/data/test/append.log
63374 /home/splunk/data/test/overwrite.log
In Splunk:
/home/splunk/data/test/overwrite.log | 8,994,387
/home/splunk/data/test/append.log | 63,965
You can clearly see the reindexing issue in overwrite.log. The small difference in append.log is likely Splunk not having indexed some entries yet.
My settings look like this:
[monitor://home/splunk/data/test/]
disabled = false
host = test
index = main
crcSalt = <SOURCE>
followTail = 1
(The actual log file I'm having trouble with contains a long header line, which is why I'm using crcSalt).
I noticed this in the change history for 4.1.4:
"monitor inputs using the followTail setting sometimes will index some older events or all events from log files which are updated when not intended. (SPL-23555) "
Are there any more details on what type of problems SPL-23555 fixed? I haven't seen any changes in behaviour after upgrading.
I've also been receiving these messages in my splunkd.log intermittently:
Time parsed (Mon Sep 6 00:02:16 2010) is too far away from the previous event's time (Mon Sep 6 14:55:55 2010) to be accepted.
The first date comes from the beginning of the file, the second from the end. Does this mean that Splunk is trying to scan the overwritten file before writing is completed, thus treating it as a completely different file with new entries?
Is overwriting log files an acceptable practice when using Splunk? Is this a bug of some kind?
Many thanks,
Chris
... View more