Getting Data In

avoid duplicate file ingestion in splunk

test4u
Path Finder

how to remove duplicate files from ingesting in splunk?
i am monitoring a folder in which there is a file names abcd.csv now i make a copy of this file and paste it again in that folder its getting ingested again hot o restrict splunk from doing so ?

Tags (1)
0 Karma

lakshman239
Influencer

if you copy and place the same file, its likely to index it again. As @FrankVl said, the splunk input monitor process checks for the CRC and indexes the files. Pls setup the inputs.conf to index the files/file pattern you need. Additionally you can use whitelist/blacklist.

https://docs.splunk.com/Documentation/Splunk/7.2.4/Data/Howlogfilerotationishandled

https://docs.splunk.com/Documentation/Splunk/7.2.4/Data/Whitelistorblacklistspecificincomingdata

0 Karma

FrankVl
Ultra Champion

The whole point is that by default, Splunk does not index files again if they are an exact copy of already ingested files.
If Splunk is ingesting those files again, that points at some specific config being in place to overrule that default behavior (e.g. changes to crcSalt setting). I would look for the solution there, rather than in changing the pattern or use white/blacklists.

But let's see what the current config is, so we can determine the best course of action 🙂

0 Karma

FrankVl
Ultra Champion

What are your inputs.conf settings for that folder? Because by default Splunk ignores files that have the same content (based on a CRC calculated over the first 256 bytes or so).

0 Karma

test4u
Path Finder

i havent made any changes to inputs.conf as such.following is my inputs.conf

[script://$SPLUNK_HOME\etc\apps\S_APP\bin\S_SCRIPT_FINAL.py]
disabled = false
index = soc
interval = 60.0
sourcetype = csv

0 Karma

FrankVl
Ultra Champion

Right, so it is a scripted input, not a file monitor as your question suggested. So the solution probably needs to be found in the workings of that script.

0 Karma
Get Updates on the Splunk Community!

Observability | Use Synthetic Monitoring for website metadata verification

If you are on Splunk observability cloud , you may already have Synthetic Monitoringin your observability ...

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...

.conf24 | Personalize your .conf experience with Learning Paths!

Personalize your .conf24 Experience Learning paths allow you to level up your skill sets and dive deeper ...