Getting Data In

Detection of duplicate files in batch mode

zapping575
Communicator

Dear all,

I have the use case that my splunk universal forwarder does not continuously monitor my logs.

Because of this nature, I am using batch mode to have the files deleted after ingestion.

Now, I occasionally receive log files which I have already received at an earlier point in time.

Problem is: The features crcSalt, initCrcLength etc. are only available in monitor mode. This means that I am not able to benefit from splunks features to prevent duplicate ingestion of the same data.

Any help on a solution for this is greatly appreciated.

Labels (3)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

I'd try writing some external "helper" script which keeps track of files.

But the question is why don't you use monitor input? Unless you absolutely need the sinkholing functionality and can't get around it another way (like logrotate or such).

0 Karma

zapping575
Communicator

Hi @PickleRick

Thanks for your reply.

I think also that keeping track of files is something that I will have to implement myself.

I was just hoping to be able to use what splunk has.

On why not using monitor inputs:

I have a http endpoint that receives log files from another system and extracts them to disk, where the forwarder then picks them up. I could use monitor mode, but because there is no log rotation or similar it will ultimately result in filling up the disk.
What is charming about batch mode is that it ensures that the disk space is free again after a new file has been completely ingested.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Yeah, but you end up with duplicates 🙂

If you can assure that there is a maximum possible period for duplicate creation, you could get away with monitor input and external script to clean up the directory of files older than given time. - that could be an alternative approach (probably easier to implement).

0 Karma

zapping575
Communicator

Hehe 🙂
Alright, thanks for your input!

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...