Getting Data In

Prevent duplicates from generic S3 input

cbreezier
Engager

I've set up a generic S3 input and it's working pretty well. However, I sometimes get duplicate events.

I believe the issue is explained here:

The S3 data input is not intended to read frequently modified files. If a file is modified after it has been indexed, the Splunk platform indexes the file again, resulting in duplicated data. Use key, blocklist, and allowlist options to instruct the add-on to index only those files that you know will not be modified later.

https://docs.splunk.com/Documentation/AddOns/released/AWS/S3

 

My setup involves S3 files that may be updated for a period of 5 minutes. After 5 minutes, they'll never be modified again. Let's start by assuming that I can't change that.

In the majority of cases, the file contents aren't actually changed - only the last modification date is changed.

I'd like the ability to do the following:

  1. Only index files that are older than 5 minutes, or
  2. Keep a CRC/hash of each file and only reindex if the hash changes, or
  3. Keep track of which line we're up to in each file and only index appended lines

3 is ideal, 1 completely fixes the problem for me (at the cost of some indexing delay), 2 greatly reduces the problem (and I think Splunk already does this for local files?)

Is any of what I'm asking for possible? Or is there another solution to my problem?

Thanks!

Labels (3)

atanu
Engager
 
Were you able to resolve this issue? 
I am also facing similar challenge for an deployed application where s3 files are getting frequently updated 
0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas     Cisco Live 2026 is almost here, and this ...

What Is the Name of the USB Key Inserted by Bob Smith? (BOTS Hint, Not the Answer)

Hello Splunkers,   So you searched, “what is the name of the usb key inserted by bob smith?”  Not gonna lie… ...

Automating Threat Operations and Threat Hunting with Recorded Future

    Automating Threat Operations and Threat Hunting with Recorded Future June 29, 2026 | Register   Is your ...