Getting Data In

Duplicate indexing of data

soumdey
Path Finder

I have situation in hand here...

I have a abc.txt file in server1 which I am monitoring using a forwarder.

The abc.txt file updates every 1 hour in such a way that, the content of the whole file is cleared and the same content is written back to the abc.txt file.

The issue in hand is, splunk is indexing the data from abc.txt everytime the content is removed and written back to the abc.txt file, which is resulting in duplication of the data multiple times.

can somebody please help me in rectifying the issue..?? do i need to change the crcinitlength value..??

Tags (2)
0 Karma
1 Solution

soumdey
Path Finder

After some R&D, i could figure out what was causing the issue and how to fix it.

The issue was with how the script was writing the data in the output file from where Splunk was forwarding the data.
The script was configured in such a way that it would erase the existing data from the file and write the existing data + the new data in the file.
This made Splunk to believe that it was a new data and index the same data all over again.
So basically, the number of times the script was running, it would create that many duplicates.

To fix the issue, what we did is, instead of rewriting the complete data all over again and again in the file, only the new data was written into the file which avoids any duplication whatsoever.

Hope I made it clear for everyone following the question.

View solution in original post

0 Karma

arunsunny
Path Finder

@soumdey - The great thing is you reduced the duplicate usage of Splunk License.

0 Karma

soumdey
Path Finder

After some R&D, i could figure out what was causing the issue and how to fix it.

The issue was with how the script was writing the data in the output file from where Splunk was forwarding the data.
The script was configured in such a way that it would erase the existing data from the file and write the existing data + the new data in the file.
This made Splunk to believe that it was a new data and index the same data all over again.
So basically, the number of times the script was running, it would create that many duplicates.

To fix the issue, what we did is, instead of rewriting the complete data all over again and again in the file, only the new data was written into the file which avoids any duplication whatsoever.

Hope I made it clear for everyone following the question.

0 Karma

soumdey
Path Finder

Can somebody please help me out here...???

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...