<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Removing duplicate entries considering multiple fields in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Removing-duplicate-entries-considering-multiple-fields/m-p/305764#M161579</link>
    <description>&lt;P&gt;The dedup can work with multiple/composite fields. If the data for today contains all the records for yesterday then why don't you just take the latest day's data so that there are no duplicates.\?&lt;/P&gt;</description>
    <pubDate>Thu, 30 Mar 2017 07:15:47 GMT</pubDate>
    <dc:creator>somesoni2</dc:creator>
    <dc:date>2017-03-30T07:15:47Z</dc:date>
    <item>
      <title>Removing duplicate entries considering multiple fields</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Removing-duplicate-entries-considering-multiple-fields/m-p/305763#M161578</link>
      <description>&lt;P&gt;&amp;#127;Hi,&lt;/P&gt;

&lt;P&gt;I have a file containing 1000 records. There are multiple entries for each of the fields Eg- camp_label, del_code, Rec_ID, Event time etc. &lt;BR /&gt;
Even the  time stamp has multiple entries.  say in total 1000 records&lt;BR /&gt;
When the same file with additional records is added again the next day(1000+500 new entries) , I do not want the duplicate entries from previous file. Presently  index has duplicate records as well (2500 records). How can i eliminate the duplicates in efficient manner. Will dedup with multiple fields work (like composite key)? Or is there any other better way? please suggest &lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 13:29:34 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Removing-duplicate-entries-considering-multiple-fields/m-p/305763#M161578</guid>
      <dc:creator>k_harini</dc:creator>
      <dc:date>2020-09-29T13:29:34Z</dc:date>
    </item>
    <item>
      <title>Re: Removing duplicate entries considering multiple fields</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Removing-duplicate-entries-considering-multiple-fields/m-p/305764#M161579</link>
      <description>&lt;P&gt;The dedup can work with multiple/composite fields. If the data for today contains all the records for yesterday then why don't you just take the latest day's data so that there are no duplicates.\?&lt;/P&gt;</description>
      <pubDate>Thu, 30 Mar 2017 07:15:47 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Removing-duplicate-entries-considering-multiple-fields/m-p/305764#M161579</guid>
      <dc:creator>somesoni2</dc:creator>
      <dc:date>2017-03-30T07:15:47Z</dc:date>
    </item>
    <item>
      <title>Re: Removing duplicate entries considering multiple fields</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Removing-duplicate-entries-considering-multiple-fields/m-p/305765#M161580</link>
      <description>&lt;P&gt;in continuous monitoring mode, how to take only latest days data. And not always all the records will repeat. If repeats discard those and consider only the new ones&lt;BR /&gt;
Events time stamps are not unique&lt;/P&gt;</description>
      <pubDate>Thu, 30 Mar 2017 07:20:03 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Removing-duplicate-entries-considering-multiple-fields/m-p/305765#M161580</guid>
      <dc:creator>k_harini</dc:creator>
      <dc:date>2017-03-30T07:20:03Z</dc:date>
    </item>
    <item>
      <title>Re: Removing duplicate entries considering multiple fields</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Removing-duplicate-entries-considering-multiple-fields/m-p/305766#M161581</link>
      <description>&lt;P&gt;Assuming this is *NIX and you are monitoring file &lt;CODE&gt;file.txt&lt;/CODE&gt; and the file is updated once a day:&lt;/P&gt;

&lt;P&gt;Change &lt;CODE&gt;inputs.conf&lt;/CODE&gt; to look for &lt;CODE&gt;file.new&lt;/CODE&gt; instead of &lt;CODE&gt;file.txt&lt;/CODE&gt;.&lt;BR /&gt;
Add this cron job to hit after the daily update (which I exemplified happening @ 2AM, so I picked 3AM):&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;00 3 * * * /bin/diff file.prev file.txt | /bin/grep "^+" | /bin/awk  'NR&amp;gt;1' | /bin/sed "s/^+//" &amp;gt; file.new &amp;amp;&amp;amp; /bin/mv file.txt file.prev
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The first time that you set this up, do this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;mv file.txt file.prev
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 30 Mar 2017 07:21:30 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Removing-duplicate-entries-considering-multiple-fields/m-p/305766#M161581</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2017-03-30T07:21:30Z</dc:date>
    </item>
  </channel>
</rss>

