Getting Data In

Duplicate indexing of data

soumdey
Path Finder

I have situation in hand here...

I have a abc.txt file in server1 which I am monitoring using a forwarder.

The abc.txt file updates every 1 hour in such a way that, the content of the whole file is cleared and the same content is written back to the abc.txt file.

The issue in hand is, splunk is indexing the data from abc.txt everytime the content is removed and written back to the abc.txt file, which is resulting in duplication of the data multiple times.

can somebody please help me in rectifying the issue..?? do i need to change the crcinitlength value..??

Tags (2)
0 Karma
1 Solution

soumdey
Path Finder

After some R&D, i could figure out what was causing the issue and how to fix it.

The issue was with how the script was writing the data in the output file from where Splunk was forwarding the data.
The script was configured in such a way that it would erase the existing data from the file and write the existing data + the new data in the file.
This made Splunk to believe that it was a new data and index the same data all over again.
So basically, the number of times the script was running, it would create that many duplicates.

To fix the issue, what we did is, instead of rewriting the complete data all over again and again in the file, only the new data was written into the file which avoids any duplication whatsoever.

Hope I made it clear for everyone following the question.

View solution in original post

0 Karma

arunsunny
Path Finder

@soumdey - The great thing is you reduced the duplicate usage of Splunk License.

0 Karma

soumdey
Path Finder

After some R&D, i could figure out what was causing the issue and how to fix it.

The issue was with how the script was writing the data in the output file from where Splunk was forwarding the data.
The script was configured in such a way that it would erase the existing data from the file and write the existing data + the new data in the file.
This made Splunk to believe that it was a new data and index the same data all over again.
So basically, the number of times the script was running, it would create that many duplicates.

To fix the issue, what we did is, instead of rewriting the complete data all over again and again in the file, only the new data was written into the file which avoids any duplication whatsoever.

Hope I made it clear for everyone following the question.

0 Karma

soumdey
Path Finder

Can somebody please help me out here...???

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas     Cisco Live 2026 is almost here, and this ...

What Is the Name of the USB Key Inserted by Bob Smith? (BOTS Hint, Not the Answer)

Hello Splunkers,   So you searched, “what is the name of the usb key inserted by bob smith?”  Not gonna lie… ...

Automating Threat Operations and Threat Hunting with Recorded Future

    Automating Threat Operations and Threat Hunting with Recorded Future June 29, 2026 | Register   Is your ...