We are currently monitoring a log file that tracks available time and unavailable time using the universal forwarder. The issue that we are running into is that we are getting duplicate events every time because Splunk seems to re-index the whole log every minute.
The log looks like this:
Unavailable 09.09.2015 18:31:11 - 09.09.2015 18:33:11 Available 09.09.2015 18:34:11 - 10.09.2015 10:49:14 Unavailable 10.09.2015 10:50:14 - 10.09.2015 11:11:14 Available 10.09.2015 11:12:14 - 17.09.2015 16:47:50 Unavailable 17.09.2015 16:48:50 - 17.09.2015 16:48:50 Available 17.09.2015 16:49:50 - 21.01.2016 12:48:27 Unavailable 21.01.2016 12:49:27 - 22.01.2016 17:28:33 Available 22.01.2016 17:29:30 - 22.01.2016 17:29:30 Unavailable 22.01.2016 17:29:33 - 22.01.2016 17:29:33 Available 22.01.2016 17:30:30 - 22.01.2016 17:30:30 Unavailable 22.01.2016 17:30:33 - 22.01.2016 17:30:33 Available 22.01.2016 17:31:30 - 22.01.2016 17:31:30
The way the file is updated is: it will update the the end time on the last line every min until the status goes to unavailable, then a new line can be created. Also, we used indexed time because there were no timestamps for each entry.
Does anyone have any ideas on how we can stop the re-indexing/duplicate events?
The events themselves look good. The only issue is that each event is duplicating every time the file updates. The log has about 200,000 lines but the events are up to a couple of million due to the re-indexing.
Splunk may re-index entire files like this if something changes at the beginning or middle of a file. If something changes, it may assume the file is new and re-index the entire thing. Is it possible something like is happening? Maybe older events are being removed from the file or "rolling" out?