Splunk Cloud Platform

How to deal with duplicate records?

alexrp25
Engager
Our app is enclosed within a Docker container environment.  We can access the app only through standard web interfaces and APIs.  We have no access to the underlying operating system.  So, through an API we retrieve the logs and store them on a remote server.  We unzip them, put them in the known paths, and the Splunk UF on that device forwards them to Splunk.
 
We retrieve our logs every hour.  They overwrite what is there.  This means that when seen by the Splunk UF, they appear to be new logs.  However, within them they are the same file, just with another hour of data in them. 
 
Could you please advise on how to deal with those seemingly duplicate log information? Is there a way to work the results in a Splunk pipe search? Or should we adjust it in our log collection process before the Splunk UF send them to the Splunk Cloud Plattform?
 
Thank you.
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

The UF has no way of knowing what is a duplicate and what is not, especially if the duplication occurs across instances of an input file.

---
If this reply helps you, Karma would be appreciated.

View solution in original post

richgalloway
SplunkTrust
SplunkTrust

The best way to deal with duplicate records is to prevent them occurring.  Duplicate events in Splunk consume license quota and storage so, even though there are ways to ignore dups at search time, they still bear a cost.  Adjust your log collection process to avoid duplicate data as much as possible.

---
If this reply helps you, Karma would be appreciated.

alexrp25
Engager

Hello Rich

Thank you very much for the advising. Is there a way I could do the logging collecting adjustment on the Universal Forwarder? I was wondering if I could make it ignore the duplicates before sending to Splunk Cloud. 

Thank you. 

0 Karma

richgalloway
SplunkTrust
SplunkTrust

The UF has no way of knowing what is a duplicate and what is not, especially if the duplication occurs across instances of an input file.

---
If this reply helps you, Karma would be appreciated.
Get Updates on the Splunk Community!

Splunk at Cisco Live 2025: Learning, Innovation, and a Little Bit of Mr. Brightside

Pack your bags (and maybe your dancing shoes)—Cisco Live is heading to San Diego, June 8–12, 2025, and Splunk ...

Splunk App Dev Community Updates – What’s New and What’s Next

Welcome to your go-to roundup of everything happening in the Splunk App Dev Community! Whether you're building ...

The Latest Cisco Integrations With Splunk Platform!

Join us for an exciting tech talk where we’ll explore the latest integrations in Cisco + Splunk! We’ve ...