Getting Data In

avoid duplicate file ingestion in splunk

test4u
Path Finder

how to remove duplicate files from ingesting in splunk?
i am monitoring a folder in which there is a file names abcd.csv now i make a copy of this file and paste it again in that folder its getting ingested again hot o restrict splunk from doing so ?

Tags (1)
0 Karma

lakshman239
Influencer

if you copy and place the same file, its likely to index it again. As @FrankVl said, the splunk input monitor process checks for the CRC and indexes the files. Pls setup the inputs.conf to index the files/file pattern you need. Additionally you can use whitelist/blacklist.

https://docs.splunk.com/Documentation/Splunk/7.2.4/Data/Howlogfilerotationishandled

https://docs.splunk.com/Documentation/Splunk/7.2.4/Data/Whitelistorblacklistspecificincomingdata

0 Karma

FrankVl
Ultra Champion

The whole point is that by default, Splunk does not index files again if they are an exact copy of already ingested files.
If Splunk is ingesting those files again, that points at some specific config being in place to overrule that default behavior (e.g. changes to crcSalt setting). I would look for the solution there, rather than in changing the pattern or use white/blacklists.

But let's see what the current config is, so we can determine the best course of action 🙂

0 Karma

FrankVl
Ultra Champion

What are your inputs.conf settings for that folder? Because by default Splunk ignores files that have the same content (based on a CRC calculated over the first 256 bytes or so).

0 Karma

test4u
Path Finder

i havent made any changes to inputs.conf as such.following is my inputs.conf

[script://$SPLUNK_HOME\etc\apps\S_APP\bin\S_SCRIPT_FINAL.py]
disabled = false
index = soc
interval = 60.0
sourcetype = csv

0 Karma

FrankVl
Ultra Champion

Right, so it is a scripted input, not a file monitor as your question suggested. So the solution probably needs to be found in the workings of that script.

0 Karma
Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

This challenge was first posted on Slack #puzzles channelFor a previous puzzle, I needed a set of fixed-length ...

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

  🚀 Your data just got a serious AI upgrade — are you ready? Say hello to the Agentic Era with the ...

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...