Getting Data In

Ingesting logs through custom API creates duplicates

JLopez
Explorer

Hi Splunkers,

Let me provide a bit of background,   We are ingesting logs into splunk using an API from our DLP service provider using HEC.

The script that run the API read the last 24 hours of events,  get the results in Json format and send it to splunk cloud using HEC.

The problem I have is every pull some of the events get duplicates  .... eg the event_id get duplicated as the events changes its status on daily basis so the the script catches that. 

so doing reporting is a nightmare, I know I can deduplicated but is there a easier way for splunk to remove those duplicates and maintain the same event_id?



Labels (2)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

The simple answer is no - once the event is there, it is there until it expires i.e. passes the retention period, or it is deleted (which is a complex and risky operation).

A more complex answer is yes, but it involves the delete command which has its challenges and is restricted by role, i.e. only those who have been assigned the role can delete events, and even then, there aren't actually deleted, they are just made invisible to searches.

A pragmatic answer (as you hinted at) is that you deduplicate or merge the events with the same event_id through your searches, which, as you point out, has its own challenges.

0 Karma
Get Updates on the Splunk Community!

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

  🚀 Your data just got a serious AI upgrade — are you ready? Say hello to the Agentic Era with the ...

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Accelerating Observability as Code with the Splunk AI Assistant

We’ve seen in previous posts what Observability as Code (OaC) is and how it’s now essential for managing ...