Getting Data In

how to avoid duplicate data when a file is created with the same name which was delete earlier.

soniaraj13
New Member

Hi,

I see duplicate data getting ingested when a file which was already ingested is being recreated upon a system failure with the existing data plus new data in the file.

For example, lets say test.csv has the following data,

a b c

When the file is deleted and recreated with the same name but with the following additional data,

a b c
1 2 3
4 5 6

it ingests a b c again besides 1 2 3...

Can someone help me with the correct stanza to be added in inputs.conf or any other solution to avoid data being duplicated as per an example mentioned above.

Thanks.

Tags (2)
0 Karma

tom_frotscher
Builder

Hi,

by default splunk reads the first few lines of a file (256 bytes) and calculates a hashvalue over those lines. When a new file with the same hash appears it isn't read. In your case, those first few lines are the same, therefore the file is reindexed again and you get your duplicates.

You can adjust the amount of bytes that are read with this value in the inputs.conf:

initCrcLength = <integer>

Here is the link to the examples and spec of the inputs.conf file.
You can find additional details for initCrcLength there, and also much more configuration options for your inputs.

Greetings

Tom

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...

SplunkTrust Application Period is Officially OPEN!

It's that time, folks! The application/nomination period for the 2026-2027 SplunkTrust is officially open. If ...