We have a forwarder monitoring a log file and are seeing duplicated data indexed from that file (by a number of indexers within the autoLB group)
I'm seeing the following in the splunkd.log file on the forwarder:
splunkd.log:02-28-2013 13:42:34.044 +0000 INFO WatchedFile - Will begin reading at offset=13142179 for file='<filename removed>'.
splunkd.log:02-28-2013 13:43:34.043 +0000 INFO WatchedFile - Will begin reading at offset=13161047 for file='<filename removed>'.
splunkd.log:02-28-2013 13:43:44.092 +0000 INFO WatchedFile - Will begin reading at offset=13138930 for file='<filename removed>'.
splunkd.log:02-28-2013 13:49:44.297 +0000 INFO WatchedFile - Will begin reading at offset=13274923 for file='<filename removed>'.
splunkd.log:02-28-2013 13:50:34.333 +0000 INFO WatchedFile - Will begin reading at offset=13329736 for file='<filename removed>'.
splunkd.log:02-28-2013 13:50:54.349 +0000 INFO WatchedFile - Will begin reading at offset=13281747 for file='<filename removed>'.
splunkd.log:02-28-2013 13:51:04.367 +0000 INFO WatchedFile - Will begin reading at offset=13281747 for file='<filename removed>'.
splunkd.log:02-28-2013 13:54:14.523 +0000 INFO WatchedFile - Will begin reading at offset=13320589 for file='<filename removed>'.
As you can see the offset position for where to start reading the file decreases occasionally.
Any suggestions as to what the issue may be? (I know our indexers are a bit overloaded at present but I'm not seeing many failed ACKs in the log file)
Edit:
inputs.conf:
[monitor://<path to file removed>/access_vap*.log]
sourcetype = jboss-access-proxy
outputs.conf
[tcpout]
defaultGroup = default-autolb-group
disabled = false
maxQueueSize = 6MB
[tcpout:default-autolb-group]
autoLB = true
disabled = false
server = <servernames>
useACK = true
while not strictly necessary you may want to add
followTail = 1
to your inputs.conf to ignore the older data which may be causing the issue.
from the doc:
followTail =1
Can be used to force splunk to skip past all current data for a given stanza.
* In more detail: this is intended to mean that if you start up splunk with a
stanza configured this way, all data in the file at the time it is first
encountered will not be read. Only data arriving after that first
encounter time will be read.
* This can be used to "skip over" data from old log files, or old portions of
log files, to get started on current data right away
could you provide the details of how the input is configured?
Thanks, details added to the original post