Prevent duplicates from generic S3 input

cbreezier · ‎09-14-2020

I've set up a generic S3 input and it's working pretty well. However, I sometimes get duplicate events.

I believe the issue is explained here:

> The S3 data input is not intended to read frequently modified files. If a file is modified after it has been indexed, the Splunk platform indexes the file again, resulting in duplicated data. Use key, blocklist, and allowlist options to instruct the add-on to index only those files that you know will not be modified later.

https://docs.splunk.com/Documentation/AddOns/released/AWS/S3

My setup involves S3 files that may be updated for a period of 5 minutes. After 5 minutes, they'll never be modified again. Let's start by assuming that I can't change that.

In the majority of cases, the file contents aren't actually changed - only the last modification date is changed.

I'd like the ability to do the following:

Only index files that are older than 5 minutes, or
Keep a CRC/hash of each file and only reindex if the hash changes, or
Keep track of which line we're up to in each file and only index appended lines

3 is ideal, 1 completely fixes the problem for me (at the cost of some indexing delay), 2 greatly reduces the problem (and I think Splunk already does this for local files?)

Is any of what I'm asking for possible? Or is there another solution to my problem?

Thanks!

atanu · ‎11-02-2021

Hi cbreezier,

Were you able to resolve this issue?

I am also facing similar challenge for an deployed application where s3 files are getting frequently updated

Prevent duplicates from generic S3 input

index

indexer

monitor

September Community Champions: A Shoutout to Our Contributors!

Splunk Decoded: Service Maps vs Service Analyzer Tree View vs Flow Maps

What’s New in Splunk Observability – September 2025

Are you a member of the Splunk Community?

Prevent duplicates from generic S3 input

index

indexer

monitor

September Community Champions: A Shoutout to Our Contributors!

Splunk Decoded: Service Maps vs Service Analyzer Tree View vs Flow Maps

What’s New in Splunk Observability – September 2025