<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How do we define a duplicate record? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/How-do-we-define-a-duplicate-record/m-p/710823#M117406</link>
    <description>&lt;P&gt;Splunk does not work like a database in this respect. So, it depends on how Splunk has been set up to detect "duplicates" of this nature. This is normally done with searches in reports or alerts or dashboards. These will normally depend on your data.&lt;/P&gt;&lt;P&gt;What searches do you already have set up?&lt;/P&gt;&lt;P&gt;What does your data look like?&lt;/P&gt;&lt;P&gt;How is it being ingested into Splunk?&lt;/P&gt;&lt;P&gt;What criteria do you want to use to determine that an event represents a duplicate?&lt;/P&gt;&lt;P&gt;Please provide as much detail as you can (without giving away sensitive information).&lt;/P&gt;</description>
    <pubDate>Thu, 06 Feb 2025 09:22:22 GMT</pubDate>
    <dc:creator>ITWhisperer</dc:creator>
    <dc:date>2025-02-06T09:22:22Z</dc:date>
    <item>
      <title>How do we define a duplicate record?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-do-we-define-a-duplicate-record/m-p/710813#M117404</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Currently using an customize&amp;nbsp;App to connect to a case / monitoring system and retrieve data.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I found out that, Splunk has the ability to detect if the data has already been indexed.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;But if I have the following scenario? will it consider as a duplicate or new data? since it has a new close case timing for the update close case.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;One of the previously closed cases has been reopened and closed again with a new case closed time. will Splunk enterprise consider as a new data to index?&lt;/P&gt;</description>
      <pubDate>Thu, 06 Feb 2025 08:44:23 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-do-we-define-a-duplicate-record/m-p/710813#M117404</guid>
      <dc:creator>ws</dc:creator>
      <dc:date>2025-02-06T08:44:23Z</dc:date>
    </item>
    <item>
      <title>Re: How do we define a duplicate record?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-do-we-define-a-duplicate-record/m-p/710821#M117405</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/276234"&gt;@ws&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Splunk indexes all new data with the only exception when all the first 256 chars of the event are the same.&lt;/P&gt;&lt;P&gt;Then (after indexing) you can dedup results eventually excluding duplicated data from results based on your requirements.&lt;/P&gt;&lt;P&gt;Deduping is usually done related to one or more fields; it's also possible to search full duplicated deduping for _raw.&lt;/P&gt;&lt;P&gt;Ciao.&lt;/P&gt;&lt;P&gt;Giuseppe&lt;/P&gt;</description>
      <pubDate>Thu, 06 Feb 2025 09:21:21 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-do-we-define-a-duplicate-record/m-p/710821#M117405</guid>
      <dc:creator>gcusello</dc:creator>
      <dc:date>2025-02-06T09:21:21Z</dc:date>
    </item>
    <item>
      <title>Re: How do we define a duplicate record?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-do-we-define-a-duplicate-record/m-p/710823#M117406</link>
      <description>&lt;P&gt;Splunk does not work like a database in this respect. So, it depends on how Splunk has been set up to detect "duplicates" of this nature. This is normally done with searches in reports or alerts or dashboards. These will normally depend on your data.&lt;/P&gt;&lt;P&gt;What searches do you already have set up?&lt;/P&gt;&lt;P&gt;What does your data look like?&lt;/P&gt;&lt;P&gt;How is it being ingested into Splunk?&lt;/P&gt;&lt;P&gt;What criteria do you want to use to determine that an event represents a duplicate?&lt;/P&gt;&lt;P&gt;Please provide as much detail as you can (without giving away sensitive information).&lt;/P&gt;</description>
      <pubDate>Thu, 06 Feb 2025 09:22:22 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-do-we-define-a-duplicate-record/m-p/710823#M117406</guid>
      <dc:creator>ITWhisperer</dc:creator>
      <dc:date>2025-02-06T09:22:22Z</dc:date>
    </item>
    <item>
      <title>Re: How do we define a duplicate record?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-do-we-define-a-duplicate-record/m-p/710828#M117407</link>
      <description>&lt;P&gt;Are you sure you're not talking about first 256 bytes of monitored file? (of course the header length is configurable). The only duplication detection I recall is connected with useACK and even then it indexes an event twice but emits a warning AFAIR.&lt;/P&gt;</description>
      <pubDate>Thu, 06 Feb 2025 11:58:10 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-do-we-define-a-duplicate-record/m-p/710828#M117407</guid>
      <dc:creator>PickleRick</dc:creator>
      <dc:date>2025-02-06T11:58:10Z</dc:date>
    </item>
    <item>
      <title>Re: How do we define a duplicate record?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-do-we-define-a-duplicate-record/m-p/710833#M117409</link>
      <description>&lt;P&gt;Splunk cannot and&amp;nbsp; does not detect if data has already been indexed.&amp;nbsp; As &lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/161352"&gt;@gcusello&lt;/a&gt; said,&amp;nbsp; it will attempt to avoid re-ingesting data, but that's not perfect.&lt;/P&gt;&lt;P&gt;It's up to the app doing the ingestion to prevent reading the same data twice.&amp;nbsp; In DB Connect, for example, a "rising column" is defined to identify unique records.&amp;nbsp; Your app could do something similar, using case ID and Closed Time, perhaps.&lt;/P&gt;</description>
      <pubDate>Thu, 06 Feb 2025 12:43:07 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-do-we-define-a-duplicate-record/m-p/710833#M117409</guid>
      <dc:creator>richgalloway</dc:creator>
      <dc:date>2025-02-06T12:43:07Z</dc:date>
    </item>
    <item>
      <title>Re: How do we define a duplicate record?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-do-we-define-a-duplicate-record/m-p/711071#M117457</link>
      <description>&lt;P&gt;Currently, we are not focusing on searches but rather on the application created to pull data from the API provided by the destination party.&lt;/P&gt;&lt;P&gt;Based on my understanding of the current setup, the new data is being retrieved by the application through the destination API.&lt;/P&gt;&lt;P&gt;The data includes fields such as ID, case status, case close date, and others.&lt;/P&gt;&lt;P&gt;At this point, duplicates will be identified based on the &lt;STRONG&gt;ID&lt;/STRONG&gt; field.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please correct me if I'm wrong, but given the current setup, wouldn't this result in duplicate data? Since we are calling at the interval of 1 hours and 4 hours duration of logs.&lt;/P&gt;&lt;P&gt;For example:&lt;/P&gt;&lt;P&gt;10am,&amp;nbsp;6am-10am&lt;BR /&gt;11am,&amp;nbsp;11am-3pm&lt;/P&gt;</description>
      <pubDate>Mon, 10 Feb 2025 05:30:56 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-do-we-define-a-duplicate-record/m-p/711071#M117457</guid>
      <dc:creator>ws</dc:creator>
      <dc:date>2025-02-10T05:30:56Z</dc:date>
    </item>
    <item>
      <title>Re: How do we define a duplicate record?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-do-we-define-a-duplicate-record/m-p/711073#M117459</link>
      <description>&lt;P&gt;Understand Splunk will perform a check of the event&lt;SPAN&gt;&amp;nbsp;at 256&amp;nbsp;chars if they are the same.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;But at my current situation, would your recommendation be that we need to customize the application to implement a checkpoint mechanism for tracking previously indexed records?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 10 Feb 2025 05:37:27 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-do-we-define-a-duplicate-record/m-p/711073#M117459</guid>
      <dc:creator>ws</dc:creator>
      <dc:date>2025-02-10T05:37:27Z</dc:date>
    </item>
    <item>
      <title>Re: How do we define a duplicate record?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-do-we-define-a-duplicate-record/m-p/711085#M117463</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/276234"&gt;@ws&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;you have many ways to check repetitive logs, the easiest is to save logs in a file with different names (e.g. adding data and time) and use the crcSalt = &amp;lt;SOURCE&amp;gt; option in the inputs.conf related stanza.&lt;/P&gt;&lt;P&gt;Ciao.&lt;/P&gt;&lt;P&gt;Giuseppe&lt;/P&gt;</description>
      <pubDate>Mon, 10 Feb 2025 07:18:20 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-do-we-define-a-duplicate-record/m-p/711085#M117463</guid>
      <dc:creator>gcusello</dc:creator>
      <dc:date>2025-02-10T07:18:20Z</dc:date>
    </item>
  </channel>
</rss>

