Splunk Enterprise

Duplicate Log Entries from S3 Bucket

Nimi1
Loves-to-Learn

Hello Everyone,

I've encountered an issue where certain customers appear to have duplicate ELB access logs. During a routine check, I noticed instances of identical events being logged with the exact same timestamp, which shouldn't normally occur.

I'm utilizing the Splunk_TA_aws app for ingesting logs, specifying each S3 bucket and the corresponding ELB log prefix as inputs. My search pattern is index=customer-index-{customer_name} sourcetype="aws:elb:accesslogs", aimed at isolating the data per customer.

Upon reviewing the original logs directly within the S3 buckets, I confirmed that the duplicates are not present at the source; they only appear once ingested into Splunk. This leads me to wonder if there might be a configuration or processing step within Splunk or the AWS Add-on that could be causing these duplicates.

Has anyone experienced a similar issue or could offer insights into potential causes or solutions? Any advice or troubleshooting tips would be greatly appreciated.

here we can see the same timestamp for the logs:

Screen Shot 2024-04-04 at 13.32.36 1.png

 if im adding | dedup _raw the number of events going down to "6535" from 12,710 

 

Thank you in advance for your assistance.

 

Labels (2)
0 Karma

KothariSurbhi
Loves-to-Learn Everything

Hello @Nimi1 ,

Below 2 points might be the reason for duplicate events ingested in Splunk -

1 - It might be possible that if you have multiple forwarders and you've enabled the same input (like ELB access logs) on all of them, it could lead to duplicate events being sent to Splunk. This duplication may occur because each forwarder is independently sending the same logs, resulting in repeated entries in your Splunk data.

2 - By mistake If you've enabled multiple inputs for the same source within Splunk, like ELB access logs, it could result in the same logs being ingested multiple times, leading to duplicates in your Splunk data.

If this reply helps you, Karma would be appreciated.


0 Karma

Nimi1
Loves-to-Learn

Thanks for your answer KothariSurbhi

After some debugging Ive discovered that Splunk pulled logs again from many buckets from all kinds of different dates on February 23rd.

It seems that logs who had already entered Splunk in 2023 entered again on February 23, 2024 for a reason that is still unclear.

Nothing happened on the AWS side and the s3 buckets looks perfectly fine.

 

Screen Shot 2024-04-08 at 12.37.18.png

0 Karma
Get Updates on the Splunk Community!

New Dates, New City: Save the Date for .conf25!

Wake up, babe! New .conf25 dates AND location just dropped!! That's right, this year, .conf25 is taking place ...

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud  In today’s fast-paced digital ...

Observability protocols to know about

Observability protocols define the specifications or formats for collecting, encoding, transporting, and ...