Re: Duplicate records

jaterlwj · ‎07-08-2012

I have tested and realized that when monitoring a file with let's say 24 rows with the option "Continuously index data from a file or directory this Splunk instance can access".
I noticed that when I add a new row and refreshes. There are now 49 rows. The older 24 records are being duplicated. Is there any option to stop duplicate rows?

Here are some specifics.
File format: .log
Specify the source:"Continuously index data from a file or directory this Splunk instance can access."

set host: constant value
set source type: manual
destination index:default

jaterlwj · ‎07-13-2012

Any help would be good!

jaterlwj · ‎07-15-2012

Hi, Thanks for your reply.

Pardon me for my ignorance, but what should I look for under the _internal index? There's roughly 1.7m events in there. 😮

ak · ‎07-13-2012

check the _internal index. it appears the whole file is being reread, thus 24 + 25 rows.

ak · ‎07-09-2012

is your log file terminated with an end of file message, something like [END OF LOG FILE]?

if so, this will confuse splunk. splunk uses the last 256 bytes for CRC. If you have a termination message that is constantly appended to your file, the CRC check will fail. When this happens, splunk rereads the file, thus duplicating records.

See: splunk log rotation

jaterlwj · ‎07-09-2012

Hi thank you for your reply! But the log file that I used does not contain any end of file message!

Duplicate records

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

From Data to Insight: Announcing the Winners of the Splunk Dashboard Contest

Splunk Developers: Construct Your Future at the .conf26 Builder Bar

Quick connection discovery mode for forwarders

Join the Conversation