Getting Data In

Duplicate indexing of data

soumdey
Path Finder

I have situation in hand here...

I have a abc.txt file in server1 which I am monitoring using a forwarder.

The abc.txt file updates every 1 hour in such a way that, the content of the whole file is cleared and the same content is written back to the abc.txt file.

The issue in hand is, splunk is indexing the data from abc.txt everytime the content is removed and written back to the abc.txt file, which is resulting in duplication of the data multiple times.

can somebody please help me in rectifying the issue..?? do i need to change the crcinitlength value..??

Tags (2)
0 Karma
1 Solution

soumdey
Path Finder

After some R&D, i could figure out what was causing the issue and how to fix it.

The issue was with how the script was writing the data in the output file from where Splunk was forwarding the data.
The script was configured in such a way that it would erase the existing data from the file and write the existing data + the new data in the file.
This made Splunk to believe that it was a new data and index the same data all over again.
So basically, the number of times the script was running, it would create that many duplicates.

To fix the issue, what we did is, instead of rewriting the complete data all over again and again in the file, only the new data was written into the file which avoids any duplication whatsoever.

Hope I made it clear for everyone following the question.

View solution in original post

0 Karma

arunsunny
Path Finder

@soumdey - The great thing is you reduced the duplicate usage of Splunk License.

0 Karma

soumdey
Path Finder

After some R&D, i could figure out what was causing the issue and how to fix it.

The issue was with how the script was writing the data in the output file from where Splunk was forwarding the data.
The script was configured in such a way that it would erase the existing data from the file and write the existing data + the new data in the file.
This made Splunk to believe that it was a new data and index the same data all over again.
So basically, the number of times the script was running, it would create that many duplicates.

To fix the issue, what we did is, instead of rewriting the complete data all over again and again in the file, only the new data was written into the file which avoids any duplication whatsoever.

Hope I made it clear for everyone following the question.

0 Karma

soumdey
Path Finder

Can somebody please help me out here...???

0 Karma
Get Updates on the Splunk Community!

Mission Control | Explore the latest release of Splunk Mission Control (2.3)

We’re happy to announce the release of Mission Control 2.3 which includes several new and exciting features ...

Cloud Platform | Migrating your Splunk Cloud deployment to Python 3.7

Python 2.7, the last release of Python 2, reached End of Life back on January 1, 2020. As part of our larger ...

Splunk Observability Cloud | Enhancing Your Onboarding Experience with the ...

We understand that your initial experience with getting data into Splunk Observability Cloud is crucial as it ...