<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to deal with duplicate records? in Splunk Cloud Platform</title>
    <link>https://community.splunk.com/t5/Splunk-Cloud-Platform/How-to-deal-with-duplicate-records/m-p/612109#M1757</link>
    <description>&lt;P&gt;The best way to deal with duplicate records is to prevent them occurring.&amp;nbsp; Duplicate events in Splunk consume license quota and storage so, even though there are ways to ignore dups at search time, they still bear a cost.&amp;nbsp; Adjust your log collection process to avoid duplicate data as much as possible.&lt;/P&gt;</description>
    <pubDate>Tue, 06 Sep 2022 14:27:27 GMT</pubDate>
    <dc:creator>richgalloway</dc:creator>
    <dc:date>2022-09-06T14:27:27Z</dc:date>
    <item>
      <title>How to deal with duplicate records?</title>
      <link>https://community.splunk.com/t5/Splunk-Cloud-Platform/How-to-deal-with-duplicate-records/m-p/611992#M1756</link>
      <description>&lt;DIV class=""&gt;
&lt;DIV class=""&gt;
&lt;DIV class=""&gt;
&lt;DIV class=""&gt;
&lt;DIV class=""&gt;
&lt;DIV class=""&gt;
&lt;DIV class=""&gt;
&lt;DIV class=""&gt;
&lt;DIV class=""&gt;
&lt;DIV class=""&gt;
&lt;DIV class=""&gt;
&lt;DIV class=""&gt;Our app is enclosed within a Docker container environment.&amp;nbsp; We can access the app only through standard web interfaces and APIs.&amp;nbsp; We have no access to the underlying operating system.&amp;nbsp; So, through an API we retrieve the logs and store them on a remote server.&amp;nbsp; We unzip them, put them in the known paths, and the Splunk UF on that device forwards them to Splunk.&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class=""&gt;
&lt;DIV class=""&gt;
&lt;DIV class=""&gt;
&lt;DIV class=""&gt;
&lt;DIV class=""&gt;
&lt;DIV class=""&gt;
&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class=""&gt;
&lt;DIV class=""&gt;
&lt;DIV class=""&gt;
&lt;DIV class=""&gt;
&lt;DIV class=""&gt;
&lt;DIV class=""&gt;
&lt;DIV class=""&gt;We retrieve our logs every hour.&amp;nbsp; They overwrite what is there.&amp;nbsp; This means that when seen by the Splunk UF, they appear to be new logs.&amp;nbsp; However, within them they are the same file, just with another hour of data in them.&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class=""&gt;Could you please advise on how to deal with those seemingly duplicate log information? Is there a way to work the results in a Splunk pipe search? Or should we adjust it in our log collection process before the Splunk UF send them to the Splunk Cloud Plattform?&lt;/DIV&gt;
&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class=""&gt;Thank you.&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Mon, 05 Sep 2022 22:46:34 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Cloud-Platform/How-to-deal-with-duplicate-records/m-p/611992#M1756</guid>
      <dc:creator>alexrp25</dc:creator>
      <dc:date>2022-09-05T22:46:34Z</dc:date>
    </item>
    <item>
      <title>Re: How to deal with duplicate records?</title>
      <link>https://community.splunk.com/t5/Splunk-Cloud-Platform/How-to-deal-with-duplicate-records/m-p/612109#M1757</link>
      <description>&lt;P&gt;The best way to deal with duplicate records is to prevent them occurring.&amp;nbsp; Duplicate events in Splunk consume license quota and storage so, even though there are ways to ignore dups at search time, they still bear a cost.&amp;nbsp; Adjust your log collection process to avoid duplicate data as much as possible.&lt;/P&gt;</description>
      <pubDate>Tue, 06 Sep 2022 14:27:27 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Cloud-Platform/How-to-deal-with-duplicate-records/m-p/612109#M1757</guid>
      <dc:creator>richgalloway</dc:creator>
      <dc:date>2022-09-06T14:27:27Z</dc:date>
    </item>
    <item>
      <title>Re: How to deal with duplicate records?</title>
      <link>https://community.splunk.com/t5/Splunk-Cloud-Platform/How-to-deal-with-duplicate-records/m-p/612127#M1758</link>
      <description>&lt;P&gt;Hello Rich&lt;/P&gt;&lt;P&gt;Thank you very much for the advising. Is there a way I could do the logging collecting adjustment on the Universal Forwarder? I was wondering if I could make it ignore the duplicates before sending to Splunk Cloud.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 06 Sep 2022 17:16:29 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Cloud-Platform/How-to-deal-with-duplicate-records/m-p/612127#M1758</guid>
      <dc:creator>alexrp25</dc:creator>
      <dc:date>2022-09-06T17:16:29Z</dc:date>
    </item>
    <item>
      <title>Re: How to deal with duplicate records?</title>
      <link>https://community.splunk.com/t5/Splunk-Cloud-Platform/How-to-deal-with-duplicate-records/m-p/612135#M1759</link>
      <description>&lt;P&gt;The UF has no way of knowing what is a duplicate and what is not, especially if the duplication occurs across instances of an input file.&lt;/P&gt;</description>
      <pubDate>Tue, 06 Sep 2022 18:36:08 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Cloud-Platform/How-to-deal-with-duplicate-records/m-p/612135#M1759</guid>
      <dc:creator>richgalloway</dc:creator>
      <dc:date>2022-09-06T18:36:08Z</dc:date>
    </item>
  </channel>
</rss>

