Getting Data In

avoid duplicate file ingestion in splunk

test4u
Path Finder

how to remove duplicate files from ingesting in splunk?
i am monitoring a folder in which there is a file names abcd.csv now i make a copy of this file and paste it again in that folder its getting ingested again hot o restrict splunk from doing so ?

Tags (1)
0 Karma

lakshman239
Influencer

if you copy and place the same file, its likely to index it again. As @FrankVl said, the splunk input monitor process checks for the CRC and indexes the files. Pls setup the inputs.conf to index the files/file pattern you need. Additionally you can use whitelist/blacklist.

https://docs.splunk.com/Documentation/Splunk/7.2.4/Data/Howlogfilerotationishandled

https://docs.splunk.com/Documentation/Splunk/7.2.4/Data/Whitelistorblacklistspecificincomingdata

0 Karma

FrankVl
Ultra Champion

The whole point is that by default, Splunk does not index files again if they are an exact copy of already ingested files.
If Splunk is ingesting those files again, that points at some specific config being in place to overrule that default behavior (e.g. changes to crcSalt setting). I would look for the solution there, rather than in changing the pattern or use white/blacklists.

But let's see what the current config is, so we can determine the best course of action 🙂

0 Karma

FrankVl
Ultra Champion

What are your inputs.conf settings for that folder? Because by default Splunk ignores files that have the same content (based on a CRC calculated over the first 256 bytes or so).

0 Karma

test4u
Path Finder

i havent made any changes to inputs.conf as such.following is my inputs.conf

[script://$SPLUNK_HOME\etc\apps\S_APP\bin\S_SCRIPT_FINAL.py]
disabled = false
index = soc
interval = 60.0
sourcetype = csv

0 Karma

FrankVl
Ultra Champion

Right, so it is a scripted input, not a file monitor as your question suggested. So the solution probably needs to be found in the workings of that script.

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...