Splunk Cloud Platform

How to deal with duplicate records?

alexrp25
Engager
Our app is enclosed within a Docker container environment.  We can access the app only through standard web interfaces and APIs.  We have no access to the underlying operating system.  So, through an API we retrieve the logs and store them on a remote server.  We unzip them, put them in the known paths, and the Splunk UF on that device forwards them to Splunk.
 
We retrieve our logs every hour.  They overwrite what is there.  This means that when seen by the Splunk UF, they appear to be new logs.  However, within them they are the same file, just with another hour of data in them. 
 
Could you please advise on how to deal with those seemingly duplicate log information? Is there a way to work the results in a Splunk pipe search? Or should we adjust it in our log collection process before the Splunk UF send them to the Splunk Cloud Plattform?
 
Thank you.
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

The UF has no way of knowing what is a duplicate and what is not, especially if the duplication occurs across instances of an input file.

---
If this reply helps you, Karma would be appreciated.

View solution in original post

richgalloway
SplunkTrust
SplunkTrust

The best way to deal with duplicate records is to prevent them occurring.  Duplicate events in Splunk consume license quota and storage so, even though there are ways to ignore dups at search time, they still bear a cost.  Adjust your log collection process to avoid duplicate data as much as possible.

---
If this reply helps you, Karma would be appreciated.

alexrp25
Engager

Hello Rich

Thank you very much for the advising. Is there a way I could do the logging collecting adjustment on the Universal Forwarder? I was wondering if I could make it ignore the duplicates before sending to Splunk Cloud. 

Thank you. 

0 Karma

richgalloway
SplunkTrust
SplunkTrust

The UF has no way of knowing what is a duplicate and what is not, especially if the duplication occurs across instances of an input file.

---
If this reply helps you, Karma would be appreciated.
Get Updates on the Splunk Community!

New Case Study Shows the Value of Partnering with Splunk Academic Alliance

The University of Nevada, Las Vegas (UNLV) is another premier research institution helping to shape the next ...

How to Monitor Google Kubernetes Engine (GKE)

We’ve looked at how to integrate Kubernetes environments with Splunk Observability Cloud, but what about ...

Index This | How can you make 45 using only 4?

October 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...