Solved: How to deal with duplicate records?

alexrp25 · ‎09-05-2022

Our app is enclosed within a Docker container environment. We can access the app only through standard web interfaces and APIs. We have no access to the underlying operating system. So, through an API we retrieve the logs and store them on a remote server. We unzip them, put them in the known paths, and the Splunk UF on that device forwards them to Splunk.

We retrieve our logs every hour. They overwrite what is there. This means that when seen by the Splunk UF, they appear to be new logs. However, within them they are the same file, just with another hour of data in them.

Could you please advise on how to deal with those seemingly duplicate log information? Is there a way to work the results in a Splunk pipe search? Or should we adjust it in our log collection process before the Splunk UF send them to the Splunk Cloud Plattform?

Thank you.

richgalloway · ‎09-06-2022

The UF has no way of knowing what is a duplicate and what is not, especially if the duplication occurs across instances of an input file.

---
If this reply helps you, Karma would be appreciated.

View solution in original post

richgalloway · ‎09-06-2022

The best way to deal with duplicate records is to prevent them occurring. Duplicate events in Splunk consume license quota and storage so, even though there are ways to ignore dups at search time, they still bear a cost. Adjust your log collection process to avoid duplicate data as much as possible.

---
If this reply helps you, Karma would be appreciated.

alexrp25 · ‎09-06-2022

Hello Rich

Thank you very much for the advising. Is there a way I could do the logging collecting adjustment on the Universal Forwarder? I was wondering if I could make it ignore the duplicates before sending to Splunk Cloud.

Thank you.

richgalloway · ‎09-06-2022

The UF has no way of knowing what is a duplicate and what is not, especially if the duplication occurs across instances of an input file.

---
If this reply helps you, Karma would be appreciated.

How to deal with duplicate records?

configuration

Splunk Investigate

using Splunk Cloud

New Case Study Shows the Value of Partnering with Splunk Academic Alliance

How to Monitor Google Kubernetes Engine (GKE)

Index This | How can you make 45 using only 4?