Windows Server 2008 R2 x64 (Windows AD Domain Controller) / Splunk 4.1.1 set up as a full forwarder (custom app via deployment server).
Upon booting following a BSOD, Splunk re-sent the entire Windows security log from its earliest event ( > 4GB worth; > 8 million events). Other logs did not see to be similarly affected.
Just an idea, this is how I should do it:
http://www.splunk.com/base/Documentation/latest/Admin/MonitorWindowsdata
start_from = oldest
current_only = 1
We'd end up losing events that way- the AD domain controller logs are quite chatty and purge their oldest events as quickly as new events are coming in (~ 40 events per second)- thanks for the thought though.
The markers for the eventlogs are stored on disk in $SPLUNK_HOME/var/lib/splunk/persistentstorage/... something or other. I'm presuming that somehow they got corrupted or erased on the BSOD. Can you evaluate what's in this directory?
To identify the dupes you could run a search over a particular time range, something like :
host=problemhost sourcetype=WinEventLog* | stats count as frequency by _raw |where frequency > 1
If this reliably is identifying them, someone smarter than me can figure out a search that grabs the first event for each duplicated set, and then delete those.
Yeah, the search I had already figured out. If I cared enough I'd figure out a way to safely purge the duplicates, but as of now I'm going to let it be.