<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to remove duplicate events in INDEX , not on Search ? in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/How-to-remove-duplicate-events-in-INDEX-not-on-Search/m-p/332894#M164453</link>
    <description>&lt;P&gt;You will need to create a search which &lt;EM&gt;finds&lt;/EM&gt; your duplicated data, and returns all but the last copy (or first - depending on your needs).&lt;BR /&gt;
Once you are happy your search correctly identifies ONLY the duplicated events you can pipe the results to &lt;CODE&gt;|delete&lt;/CODE&gt; which will &lt;STRONG&gt;&lt;EM&gt;remove&lt;/EM&gt;&lt;/STRONG&gt; the data from the indexes.&lt;BR /&gt;&lt;BR /&gt;
&lt;EM&gt;You will need to be a user with 'can delete' permissions - no user has this be default (not even admin) so you may need to add this capability to your user first - its also a good idea to remove this capability when you have finished to prevent accidents! (been there)&lt;/EM&gt;&lt;/P&gt;

&lt;P&gt;Its worth noting that this will &lt;STRONG&gt;not&lt;/STRONG&gt; remove the data from disk - it simply marks it as deleted in the buckets, so it wont be returned in future searches&lt;/P&gt;</description>
    <pubDate>Mon, 11 Dec 2017 08:49:50 GMT</pubDate>
    <dc:creator>nickhills</dc:creator>
    <dc:date>2017-12-11T08:49:50Z</dc:date>
    <item>
      <title>How to remove duplicate events in INDEX , not on Search ?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-remove-duplicate-events-in-INDEX-not-on-Search/m-p/332893#M164452</link>
      <description>&lt;P&gt;I do have many data  including duplicate data , and i want to remove duplicate data from the index , without using the ""DEDUP" command since it only remove the event on SEARCH not in INDEX , can somebody help me ?&lt;/P&gt;</description>
      <pubDate>Mon, 11 Dec 2017 03:07:44 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-remove-duplicate-events-in-INDEX-not-on-Search/m-p/332893#M164452</guid>
      <dc:creator>jadengoho</dc:creator>
      <dc:date>2017-12-11T03:07:44Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove duplicate events in INDEX , not on Search ?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-remove-duplicate-events-in-INDEX-not-on-Search/m-p/332894#M164453</link>
      <description>&lt;P&gt;You will need to create a search which &lt;EM&gt;finds&lt;/EM&gt; your duplicated data, and returns all but the last copy (or first - depending on your needs).&lt;BR /&gt;
Once you are happy your search correctly identifies ONLY the duplicated events you can pipe the results to &lt;CODE&gt;|delete&lt;/CODE&gt; which will &lt;STRONG&gt;&lt;EM&gt;remove&lt;/EM&gt;&lt;/STRONG&gt; the data from the indexes.&lt;BR /&gt;&lt;BR /&gt;
&lt;EM&gt;You will need to be a user with 'can delete' permissions - no user has this be default (not even admin) so you may need to add this capability to your user first - its also a good idea to remove this capability when you have finished to prevent accidents! (been there)&lt;/EM&gt;&lt;/P&gt;

&lt;P&gt;Its worth noting that this will &lt;STRONG&gt;not&lt;/STRONG&gt; remove the data from disk - it simply marks it as deleted in the buckets, so it wont be returned in future searches&lt;/P&gt;</description>
      <pubDate>Mon, 11 Dec 2017 08:49:50 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-remove-duplicate-events-in-INDEX-not-on-Search/m-p/332894#M164453</guid>
      <dc:creator>nickhills</dc:creator>
      <dc:date>2017-12-11T08:49:50Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove duplicate events in INDEX , not on Search ?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-remove-duplicate-events-in-INDEX-not-on-Search/m-p/332895#M164454</link>
      <description>&lt;P&gt;@jadengoho, are these duplicates old or your data will keep on having duplicate data in future as well? If there will be duplicates, what is the source/cause/frequency of duplicate data?&lt;/P&gt;</description>
      <pubDate>Mon, 11 Dec 2017 08:55:27 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-remove-duplicate-events-in-INDEX-not-on-Search/m-p/332895#M164454</guid>
      <dc:creator>niketn</dc:creator>
      <dc:date>2017-12-11T08:55:27Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove duplicate events in INDEX , not on Search ?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-remove-duplicate-events-in-INDEX-not-on-Search/m-p/332896#M164455</link>
      <description>&lt;P&gt;it is a daily logs data , so duplicate data is a problem , cause they are just stacking .&lt;/P&gt;</description>
      <pubDate>Mon, 11 Dec 2017 08:57:29 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-remove-duplicate-events-in-INDEX-not-on-Search/m-p/332896#M164455</guid>
      <dc:creator>jadengoho</dc:creator>
      <dc:date>2017-12-11T08:57:29Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove duplicate events in INDEX , not on Search ?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-remove-duplicate-events-in-INDEX-not-on-Search/m-p/332897#M164456</link>
      <description>&lt;P&gt;If you can fix data while ingestion that would be best. Else you can run a daily scheduled search (to run after data is ingested), which will list all daily data with dedup and push it to separate index. &lt;/P&gt;

&lt;P&gt;Refer to Splunk Documentation: &lt;A href="https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Collect#Moving_events_to_a_different_index"&gt;https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Collect#Moving_events_to_a_different_index&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;PS: &lt;BR /&gt;
You can use &lt;CODE&gt;collect&lt;/CODE&gt; command to do this, however, to me seems overhead unless fixed prior to indexing. &lt;BR /&gt;
You can also think of scripted input to do this in case there are no other means of preventing duplicated events from being indexed.&lt;BR /&gt;
Using collect command if you define &lt;CODE&gt;sourcetype&lt;/CODE&gt; other than &lt;CODE&gt;stash&lt;/CODE&gt;, it will count against your license.&lt;/P&gt;</description>
      <pubDate>Mon, 11 Dec 2017 10:00:53 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-remove-duplicate-events-in-INDEX-not-on-Search/m-p/332897#M164456</guid>
      <dc:creator>niketn</dc:creator>
      <dc:date>2017-12-11T10:00:53Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove duplicate events in INDEX , not on Search ?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-remove-duplicate-events-in-INDEX-not-on-Search/m-p/332898#M164457</link>
      <description>&lt;P&gt;I have the same problem, do I need to use a script to fix this issue? If yes, what kind of script should I use?&lt;/P&gt;</description>
      <pubDate>Tue, 12 Dec 2017 01:49:57 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-remove-duplicate-events-in-INDEX-not-on-Search/m-p/332898#M164457</guid>
      <dc:creator>mjlsnombrado</dc:creator>
      <dc:date>2017-12-12T01:49:57Z</dc:date>
    </item>
  </channel>
</rss>

