Splunk Cloud Platform

How to deal with duplicate records?

alexrp25
Engager
Our app is enclosed within a Docker container environment.  We can access the app only through standard web interfaces and APIs.  We have no access to the underlying operating system.  So, through an API we retrieve the logs and store them on a remote server.  We unzip them, put them in the known paths, and the Splunk UF on that device forwards them to Splunk.
 
We retrieve our logs every hour.  They overwrite what is there.  This means that when seen by the Splunk UF, they appear to be new logs.  However, within them they are the same file, just with another hour of data in them. 
 
Could you please advise on how to deal with those seemingly duplicate log information? Is there a way to work the results in a Splunk pipe search? Or should we adjust it in our log collection process before the Splunk UF send them to the Splunk Cloud Plattform?
 
Thank you.
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

The UF has no way of knowing what is a duplicate and what is not, especially if the duplication occurs across instances of an input file.

---
If this reply helps you, Karma would be appreciated.

View solution in original post

richgalloway
SplunkTrust
SplunkTrust

The best way to deal with duplicate records is to prevent them occurring.  Duplicate events in Splunk consume license quota and storage so, even though there are ways to ignore dups at search time, they still bear a cost.  Adjust your log collection process to avoid duplicate data as much as possible.

---
If this reply helps you, Karma would be appreciated.

alexrp25
Engager

Hello Rich

Thank you very much for the advising. Is there a way I could do the logging collecting adjustment on the Universal Forwarder? I was wondering if I could make it ignore the duplicates before sending to Splunk Cloud. 

Thank you. 

0 Karma

richgalloway
SplunkTrust
SplunkTrust

The UF has no way of knowing what is a duplicate and what is not, especially if the duplication occurs across instances of an input file.

---
If this reply helps you, Karma would be appreciated.
Get Updates on the Splunk Community!

CX Day is Coming!

Customer Experience (CX) Day is on October 7th!! We're so excited to bring back another day full of wonderful ...

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

The Big One: Splunk 10 is Here!  The moment many of you have been waiting for has arrived! We are thrilled to ...

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console

Today, we’re excited to announce the release of a brand new AI assistant usage dashboard in Cloud Monitoring ...