<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: I want to delete duplicates from the splunk index which have same _raw and same _time in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/I-want-to-delete-duplicates-from-the-splunk-index-which-have/m-p/247772#M189158</link>
    <description>&lt;P&gt;We are trying to put the event Ids of duplicate records in a csv file. However when we run the below command it gives no results.&lt;/P&gt;

&lt;P&gt;index=idx1 sourcetype=csv|stats count by _raw _time |where cnt&amp;gt;1|eval eid=_cd*&lt;STRONG&gt;&lt;EM&gt;|stats count by eid|fields - count|outputlookup filename.csv&lt;/EM&gt;&lt;/STRONG&gt;*&lt;/P&gt;

&lt;P&gt;But when we run it without the part in bold, it gives the correct count.&lt;BR /&gt;
So we are not able to figure out what is wrong in the later part of the query highlighted in bold.&lt;/P&gt;</description>
    <pubDate>Tue, 29 Sep 2020 10:45:47 GMT</pubDate>
    <dc:creator>ashutoshsharma1</dc:creator>
    <dc:date>2020-09-29T10:45:47Z</dc:date>
    <item>
      <title>I want to delete duplicates from the splunk index which have same _raw and same _time</title>
      <link>https://community.splunk.com/t5/Splunk-Search/I-want-to-delete-duplicates-from-the-splunk-index-which-have/m-p/247766#M189152</link>
      <description>&lt;P&gt;Tried using the already answered question on splunk answer on the same topic they say do it using lookup or sub search. like this&lt;/P&gt;

&lt;P&gt;Error on using delete on stream stats:-&lt;/P&gt;

&lt;P&gt;index=idx1 sourcetype=csv|streamstats count by _raw _time |where cnt&amp;gt;1|delete&lt;/P&gt;

&lt;P&gt;Both the below methods are giving me wrong output i have many duplicates but below commands only showing me a few.&lt;BR /&gt;
Subsearch method:-&lt;/P&gt;

&lt;P&gt;index=idx1 sourcetype=csv| eval eid=_cd|search[ |streamstats count as cnt by _raw _time | where cnt&amp;gt;1| field eid ]|delete&lt;/P&gt;

&lt;P&gt;Lookup method:-&lt;/P&gt;

&lt;P&gt;index=idx1 sourcetype=csv|streamstats count by _raw _time |where cnt&amp;gt;1| eid=_cd | stats count by eid| fields eid| outputcsv del_id.csv&lt;/P&gt;

&lt;P&gt;index=id1 sourcetype=csv|eval eid=_cd | search [ del_id.csv ] | delete&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 10:44:12 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/I-want-to-delete-duplicates-from-the-splunk-index-which-have/m-p/247766#M189152</guid>
      <dc:creator>ashutoshsharma1</dc:creator>
      <dc:date>2020-09-29T10:44:12Z</dc:date>
    </item>
    <item>
      <title>Re: I want to delete duplicates from the splunk index which have same _raw and same _time</title>
      <link>https://community.splunk.com/t5/Splunk-Search/I-want-to-delete-duplicates-from-the-splunk-index-which-have/m-p/247767#M189153</link>
      <description>&lt;P&gt;This will give you the data you need:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=foo sourcetype=bar | stats count by _time _raw | where count &amp;gt; 1
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;For large amounts of data this might take a while.&lt;/P&gt;

&lt;P&gt;To actually delete, you cannot use nonstreaming commands - you'd need to write the &lt;CODE&gt;_bkt&lt;/CODE&gt; and &lt;CODE&gt;_cd&lt;/CODE&gt; fields into a lookup, and then search like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=foo sourcetype=bar [inputlookup that_lookup] | delete
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Make sure to test this on unimportant data first!&lt;/P&gt;</description>
      <pubDate>Fri, 26 Aug 2016 08:34:25 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/I-want-to-delete-duplicates-from-the-splunk-index-which-have/m-p/247767#M189153</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2016-08-26T08:34:25Z</dc:date>
    </item>
    <item>
      <title>Re: I want to delete duplicates from the splunk index which have same _raw and same _time</title>
      <link>https://community.splunk.com/t5/Splunk-Search/I-want-to-delete-duplicates-from-the-splunk-index-which-have/m-p/247768#M189154</link>
      <description>&lt;P&gt;adding the |delete at the end - &lt;BR /&gt;
    index=idx1 sourcetype=csv|stats count by _raw _time |where count &amp;gt; 1|delete&lt;/P&gt;</description>
      <pubDate>Fri, 26 Aug 2016 08:43:55 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/I-want-to-delete-duplicates-from-the-splunk-index-which-have/m-p/247768#M189154</guid>
      <dc:creator>inventsekar</dc:creator>
      <dc:date>2016-08-26T08:43:55Z</dc:date>
    </item>
    <item>
      <title>Re: I want to delete duplicates from the splunk index which have same _raw and same _time</title>
      <link>https://community.splunk.com/t5/Splunk-Search/I-want-to-delete-duplicates-from-the-splunk-index-which-have/m-p/247769#M189155</link>
      <description>&lt;P&gt;I'd be careful about that. What is &lt;CODE&gt;delete&lt;/CODE&gt; supposed to do after a reporting command such as &lt;CODE&gt;stats&lt;/CODE&gt;? Delete both inputs to the &lt;CODE&gt;stats&lt;/CODE&gt;? Delete only one? Which one?&lt;/P&gt;</description>
      <pubDate>Fri, 26 Aug 2016 08:47:32 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/I-want-to-delete-duplicates-from-the-splunk-index-which-have/m-p/247769#M189155</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2016-08-26T08:47:32Z</dc:date>
    </item>
    <item>
      <title>Re: I want to delete duplicates from the splunk index which have same _raw and same _time</title>
      <link>https://community.splunk.com/t5/Splunk-Search/I-want-to-delete-duplicates-from-the-splunk-index-which-have/m-p/247770#M189156</link>
      <description>&lt;P&gt;ohk, as the question was - "I want to delete duplicates from the splunk index which have same _raw and same _time", and, so, after finding "the data you need" thru your first query, i thought he should use |delete command. &lt;BR /&gt;
now only the thought came to me - why delete after stats. &lt;BR /&gt;
thanks a lot.. good learning! &lt;/P&gt;</description>
      <pubDate>Fri, 26 Aug 2016 09:18:24 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/I-want-to-delete-duplicates-from-the-splunk-index-which-have/m-p/247770#M189156</guid>
      <dc:creator>inventsekar</dc:creator>
      <dc:date>2016-08-26T09:18:24Z</dc:date>
    </item>
    <item>
      <title>Re: I want to delete duplicates from the splunk index which have same _raw and same _time</title>
      <link>https://community.splunk.com/t5/Splunk-Search/I-want-to-delete-duplicates-from-the-splunk-index-which-have/m-p/247771#M189157</link>
      <description>&lt;P&gt;@martin_mueller &lt;BR /&gt;
Sir I will try your answer .&lt;BR /&gt;
Can you please help me understand why i am not getting the correct answer using the streamstats and count&amp;gt;1 .As i don't have much understanding how streamstats works can be a good learning.&lt;/P&gt;</description>
      <pubDate>Fri, 26 Aug 2016 09:32:37 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/I-want-to-delete-duplicates-from-the-splunk-index-which-have/m-p/247771#M189157</guid>
      <dc:creator>ashutoshsharma1</dc:creator>
      <dc:date>2016-08-26T09:32:37Z</dc:date>
    </item>
    <item>
      <title>Re: I want to delete duplicates from the splunk index which have same _raw and same _time</title>
      <link>https://community.splunk.com/t5/Splunk-Search/I-want-to-delete-duplicates-from-the-splunk-index-which-have/m-p/247772#M189158</link>
      <description>&lt;P&gt;We are trying to put the event Ids of duplicate records in a csv file. However when we run the below command it gives no results.&lt;/P&gt;

&lt;P&gt;index=idx1 sourcetype=csv|stats count by _raw _time |where cnt&amp;gt;1|eval eid=_cd*&lt;STRONG&gt;&lt;EM&gt;|stats count by eid|fields - count|outputlookup filename.csv&lt;/EM&gt;&lt;/STRONG&gt;*&lt;/P&gt;

&lt;P&gt;But when we run it without the part in bold, it gives the correct count.&lt;BR /&gt;
So we are not able to figure out what is wrong in the later part of the query highlighted in bold.&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 10:45:47 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/I-want-to-delete-duplicates-from-the-splunk-index-which-have/m-p/247772#M189158</guid>
      <dc:creator>ashutoshsharma1</dc:creator>
      <dc:date>2020-09-29T10:45:47Z</dc:date>
    </item>
    <item>
      <title>Re: I want to delete duplicates from the splunk index which have same _raw and same _time</title>
      <link>https://community.splunk.com/t5/Splunk-Search/I-want-to-delete-duplicates-from-the-splunk-index-which-have/m-p/247773#M189159</link>
      <description>&lt;P&gt;Streamstats is centralized streaming command that gets executed on the search head... the delete has to run on the indexers though, they're the ones storing and therefore deleting data. As a result, you can only run distributed streaming commands before a delete.&lt;BR /&gt;
See &lt;A href="http://docs.splunk.com/Documentation/Splunk/6.4.3/SearchReference/Commandsbytype"&gt;http://docs.splunk.com/Documentation/Splunk/6.4.3/SearchReference/Commandsbytype&lt;/A&gt; for reference.&lt;/P&gt;

&lt;P&gt;As for your second comment, the &lt;CODE&gt;stats&lt;/CODE&gt; only produces three fields - &lt;CODE&gt;_time _raw count&lt;/CODE&gt;. There's no &lt;CODE&gt;_bkt&lt;/CODE&gt; and &lt;CODE&gt;_cd&lt;/CODE&gt; field present. One approach to copy them over might work something like this (untested):&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=foo sourcetype=bar | stats count list(_cd) as _cd list(_bkt) as _bkt by _time _raw | where count &amp;gt; 1 | eval _cd = mvindex(_cd, 1, -1) | eval _bkt = mvindex(_bkt, 1, -1) | eval zip = mvzip(_cd, _bkt, "##breaker##") | mvexpand zip | eval zip = split(zip, "##breaker##") | eval _cd = mvindex(zip, 0) | eval _bkt = mvindex(zip, 1) | fields - zip count | outputlookup ...
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The &lt;CODE&gt;list()&lt;/CODE&gt; tell the &lt;CODE&gt;stats&lt;/CODE&gt; to keep a list of the two fields identifying an event, and the first two &lt;CODE&gt;mvindex()&lt;/CODE&gt; should keep all but the first element in the list. The whole &lt;CODE&gt;mvzip|mvexpand|split|mvindex&lt;/CODE&gt; part should put every event into one row.&lt;/P&gt;

&lt;P&gt;Again, important reminder: Don't run this without first extensively testing on unimportant data!&lt;/P&gt;</description>
      <pubDate>Fri, 26 Aug 2016 11:19:31 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/I-want-to-delete-duplicates-from-the-splunk-index-which-have/m-p/247773#M189159</guid>
      <dc:creator>martin_mueller</dc:creator>
      <dc:date>2016-08-26T11:19:31Z</dc:date>
    </item>
  </channel>
</rss>

