Getting Data In

Splunk data duplicated when reading offset decreases

samhughe
Path Finder

We have a forwarder monitoring a log file and are seeing duplicated data indexed from that file (by a number of indexers within the autoLB group)

I'm seeing the following in the splunkd.log file on the forwarder:

splunkd.log:02-28-2013 13:42:34.044 +0000 INFO  WatchedFile - Will begin reading at offset=13142179 for file='<filename removed>'.
splunkd.log:02-28-2013 13:43:34.043 +0000 INFO  WatchedFile - Will begin reading at offset=13161047 for file='<filename removed>'.
splunkd.log:02-28-2013 13:43:44.092 +0000 INFO  WatchedFile - Will begin reading at offset=13138930 for file='<filename removed>'.
splunkd.log:02-28-2013 13:49:44.297 +0000 INFO  WatchedFile - Will begin reading at offset=13274923 for file='<filename removed>'.
splunkd.log:02-28-2013 13:50:34.333 +0000 INFO  WatchedFile - Will begin reading at offset=13329736 for file='<filename removed>'.
splunkd.log:02-28-2013 13:50:54.349 +0000 INFO  WatchedFile - Will begin reading at offset=13281747 for file='<filename removed>'.
splunkd.log:02-28-2013 13:51:04.367 +0000 INFO  WatchedFile - Will begin reading at offset=13281747 for file='<filename removed>'.
splunkd.log:02-28-2013 13:54:14.523 +0000 INFO  WatchedFile - Will begin reading at offset=13320589 for file='<filename removed>'.

As you can see the offset position for where to start reading the file decreases occasionally.

Any suggestions as to what the issue may be? (I know our indexers are a bit overloaded at present but I'm not seeing many failed ACKs in the log file)

Edit:
inputs.conf:

[monitor://<path to file removed>/access_vap*.log]
sourcetype = jboss-access-proxy

outputs.conf

[tcpout]
defaultGroup = default-autolb-group
disabled = false
maxQueueSize = 6MB

[tcpout:default-autolb-group]
autoLB = true
disabled = false
server = <servernames>
useACK = true
Tags (2)
0 Karma

Kate_Lawrence-G
Contributor

while not strictly necessary you may want to add
followTail = 1
to your inputs.conf to ignore the older data which may be causing the issue.

from the doc:

followTail =1
Can be used to force splunk to skip past all current data for a given stanza.
* In more detail: this is intended to mean that if you start up splunk with a
stanza configured this way, all data in the file at the time it is first
encountered will not be read. Only data arriving after that first
encounter time will be read.
* This can be used to "skip over" data from old log files, or old portions of
log files, to get started on current data right away

0 Karma

Kate_Lawrence-G
Contributor

could you provide the details of how the input is configured?

0 Karma

samhughe
Path Finder

Thanks, details added to the original post

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...