topic Re: How to deal with duplicate records? in Splunk Cloud Platform

How to deal with duplicate records?

alexrp25 — Mon, 05 Sep 2022 22:46:34 GMT

Our app is enclosed within a Docker container environment. We can access the app only through standard web interfaces and APIs. We have no access to the underlying operating system. So, through an API we retrieve the logs and store them on a remote server. We unzip them, put them in the known paths, and the Splunk UF on that device forwards them to Splunk.

We retrieve our logs every hour. They overwrite what is there. This means that when seen by the Splunk UF, they appear to be new logs. However, within them they are the same file, just with another hour of data in them.

Could you please advise on how to deal with those seemingly duplicate log information? Is there a way to work the results in a Splunk pipe search? Or should we adjust it in our log collection process before the Splunk UF send them to the Splunk Cloud Plattform?

Thank you.

Re: How to deal with duplicate records?

richgalloway — Tue, 06 Sep 2022 14:27:27 GMT

The best way to deal with duplicate records is to prevent them occurring. Duplicate events in Splunk consume license quota and storage so, even though there are ways to ignore dups at search time, they still bear a cost. Adjust your log collection process to avoid duplicate data as much as possible.

Re: How to deal with duplicate records?

alexrp25 — Tue, 06 Sep 2022 17:16:29 GMT

Hello Rich

Thank you very much for the advising. Is there a way I could do the logging collecting adjustment on the Universal Forwarder? I was wondering if I could make it ignore the duplicates before sending to Splunk Cloud.

Thank you.

Re: How to deal with duplicate records?

richgalloway — Tue, 06 Sep 2022 18:36:08 GMT

The UF has no way of knowing what is a duplicate and what is not, especially if the duplication occurs across instances of an input file.