<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to remove duplicate data and how to prevent having duplicate data? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216163#M42553</link>
    <description>&lt;P&gt;I have permission to use the delete command, the problem is that I don't know how to select the appropriate data for deleting.&lt;/P&gt;</description>
    <pubDate>Fri, 30 Oct 2015 17:12:35 GMT</pubDate>
    <dc:creator>edrivera3</dc:creator>
    <dc:date>2015-10-30T17:12:35Z</dc:date>
    <item>
      <title>How to remove duplicate data and how to prevent having duplicate data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216157#M42547</link>
      <description>&lt;P&gt;Hi&lt;/P&gt;

&lt;P&gt;I have many configuration text file which basically looks like this:&lt;BR /&gt;
Owner Name: AAAAA AAAAA&lt;BR /&gt;
Product Name: AAAA AAAA&lt;BR /&gt;
Product ID: NNNNN-NN     Serial ID: NN-NN-NN-NNNNN&lt;/P&gt;

&lt;P&gt;Sometimes there is change in the product ID or serial ID and I want to index the new change but I don't want to keep old event. Basically, I want to replace the old configuration file with the new one.&lt;/P&gt;

&lt;P&gt;I tried the below inputs.conf because some files where are not getting index because of the similarity between them. Every thing was fine until I found out that every time there is change in the configuration text file, the file is index but it doesn't replace the old one. So now I have multiple configuration files with the same source which is a problem.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;[Monitor://Some directory]
index = my_index
sourcetype = my_sourcetype
crcSalt = &amp;lt;SOURCE&amp;gt;
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;(1) Right now I need to  delete all events that have already a new version of them based on the _indextime.&lt;BR /&gt;
(2) I need a new inputs.conf setup that will prevent this behavior.&lt;/P&gt;</description>
      <pubDate>Fri, 30 Oct 2015 15:43:47 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216157#M42547</guid>
      <dc:creator>edrivera3</dc:creator>
      <dc:date>2015-10-30T15:43:47Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove duplicate data and how to prevent having duplicate data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216158#M42548</link>
      <description>&lt;P&gt;Splunk does not have a "index this only if it's not already indexed" feature.  The performance of such feature probably would be poor.  Nor will it replace or update anything already indexed.&lt;BR /&gt;
You can remove duplicate data (or any data) by piping a search to the &lt;CODE&gt;delete&lt;/CODE&gt; command.&lt;/P&gt;</description>
      <pubDate>Fri, 30 Oct 2015 16:00:43 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216158#M42548</guid>
      <dc:creator>richgalloway</dc:creator>
      <dc:date>2015-10-30T16:00:43Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove duplicate data and how to prevent having duplicate data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216159#M42549</link>
      <description>&lt;P&gt;Ok. That's too bad. But how to make Splunk delete events that has a new version of them? I know about the delete command, but I haven't been successful to select the appropriate data. With the below command I've been able to see the indextime and which files have more than two files for the same source. &lt;BR /&gt;
    ... |  eval indextime=strftime(_indextime,"%Y-%m-%d %H:%M:%S") | stats count by source| where count&amp;gt;1 &lt;/P&gt;</description>
      <pubDate>Fri, 30 Oct 2015 16:07:57 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216159#M42549</guid>
      <dc:creator>edrivera3</dc:creator>
      <dc:date>2015-10-30T16:07:57Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove duplicate data and how to prevent having duplicate data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216160#M42550</link>
      <description>&lt;P&gt;This should be the opposite of dedup:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;... | eventstats max(_indextime) AS latestIndexTime by source | where _indextime&amp;lt;latestIndexTime
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Then you just pipe that to &lt;CODE&gt;delete&lt;/CODE&gt; by adding this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;... | delete
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 30 Oct 2015 16:44:36 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216160#M42550</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2015-10-30T16:44:36Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove duplicate data and how to prevent having duplicate data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216161#M42551</link>
      <description>&lt;P&gt;The only way I have found is deleting file at a time which is very inefficient. &lt;/P&gt;</description>
      <pubDate>Fri, 30 Oct 2015 16:51:36 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216161#M42551</guid>
      <dc:creator>edrivera3</dc:creator>
      <dc:date>2015-10-30T16:51:36Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove duplicate data and how to prevent having duplicate data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216162#M42552</link>
      <description>&lt;P&gt;You need to have permission to use the &lt;CODE&gt;delete&lt;/CODE&gt; command.  That's the best way to remove events from Splunk.&lt;/P&gt;</description>
      <pubDate>Fri, 30 Oct 2015 17:10:35 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216162#M42552</guid>
      <dc:creator>richgalloway</dc:creator>
      <dc:date>2015-10-30T17:10:35Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove duplicate data and how to prevent having duplicate data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216163#M42553</link>
      <description>&lt;P&gt;I have permission to use the delete command, the problem is that I don't know how to select the appropriate data for deleting.&lt;/P&gt;</description>
      <pubDate>Fri, 30 Oct 2015 17:12:35 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216163#M42553</guid>
      <dc:creator>edrivera3</dc:creator>
      <dc:date>2015-10-30T17:12:35Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove duplicate data and how to prevent having duplicate data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216164#M42554</link>
      <description>&lt;P&gt;See my Answer!&lt;/P&gt;</description>
      <pubDate>Fri, 30 Oct 2015 17:22:57 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216164#M42554</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2015-10-30T17:22:57Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove duplicate data and how to prevent having duplicate data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216165#M42555</link>
      <description>&lt;P&gt;Your command is perfect for selecting the events, but I encountered the following error when added the delete command.&lt;BR /&gt;
Error in 'delete' command: This command cannot be invoked after the non-streaming command 'eventstats'. &lt;BR /&gt;
The search job has failed due to an error. You may be able view the job in the Job Inspector. &lt;/P&gt;

&lt;P&gt;I am going to retry to run the command, but for some reason it takes so much time to run.&lt;/P&gt;</description>
      <pubDate>Fri, 30 Oct 2015 17:28:27 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216165#M42555</guid>
      <dc:creator>edrivera3</dc:creator>
      <dc:date>2015-10-30T17:28:27Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove duplicate data and how to prevent having duplicate data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216166#M42556</link>
      <description>&lt;P&gt;I got the same error. My roles are can_delete, user, power.&lt;/P&gt;</description>
      <pubDate>Fri, 30 Oct 2015 17:32:41 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216166#M42556</guid>
      <dc:creator>edrivera3</dc:creator>
      <dc:date>2015-10-30T17:32:41Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove duplicate data and how to prevent having duplicate data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216167#M42557</link>
      <description>&lt;P&gt;The first part of the query can be as simple as &lt;CODE&gt;index=foo sourcetype=my_sourcetype&lt;/CODE&gt;.&lt;/P&gt;</description>
      <pubDate>Fri, 30 Oct 2015 17:33:28 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216167#M42557</guid>
      <dc:creator>richgalloway</dc:creator>
      <dc:date>2015-10-30T17:33:28Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove duplicate data and how to prevent having duplicate data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216168#M42558</link>
      <description>&lt;P&gt;Try this instead.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=foo sourcetype=my_sourcetype | eval oldest=relative_time(now(),"-1d@d") | where _indextime&amp;lt;oldest
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Adjust the arguments to relative_time as needed.&lt;/P&gt;</description>
      <pubDate>Fri, 30 Oct 2015 17:42:52 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216168#M42558</guid>
      <dc:creator>richgalloway</dc:creator>
      <dc:date>2015-10-30T17:42:52Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove duplicate data and how to prevent having duplicate data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216169#M42559</link>
      <description>&lt;P&gt;My solution will not work then, because evidently the use of &lt;CODE&gt;eventstats&lt;/CODE&gt; precludes the use of &lt;CODE&gt;delete&lt;/CODE&gt; (which, IMHO, is definitely a bug).&lt;/P&gt;</description>
      <pubDate>Fri, 30 Oct 2015 17:52:27 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216169#M42559</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2015-10-30T17:52:27Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove duplicate data and how to prevent having duplicate data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216170#M42560</link>
      <description>&lt;P&gt;I don't see how this solve my problem. Could you elaborate or explain more your solution?&lt;BR /&gt;
now()= is the time when the search started&lt;BR /&gt;
oldest = is 1 day before the search request started&lt;/P&gt;

&lt;P&gt;Basically I need a solution that provides the same results than woodcock's solution but without using eventstat.&lt;/P&gt;</description>
      <pubDate>Fri, 30 Oct 2015 17:52:44 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216170#M42560</guid>
      <dc:creator>edrivera3</dc:creator>
      <dc:date>2015-10-30T17:52:44Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove duplicate data and how to prevent having duplicate data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216171#M42561</link>
      <description>&lt;P&gt;OK, try this then:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;... NOT [... | sort - _indextime | dedup source | fields _raw]
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;If this looks correct (has only the bad stuff), then it should be safe to pipe this to &lt;CODE&gt;| delete&lt;/CODE&gt;.&lt;/P&gt;</description>
      <pubDate>Fri, 30 Oct 2015 18:19:56 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216171#M42561</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2015-10-30T18:19:56Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove duplicate data and how to prevent having duplicate data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216172#M42562</link>
      <description>&lt;P&gt;It is very interesting your idea, but it didn't work and I am not sure why.&lt;/P&gt;

&lt;P&gt;The right side search provides all the good data, so  basically the Boolean operator NOT should eliminate the good data from the total data leaving only the bad data.  &lt;/P&gt;</description>
      <pubDate>Fri, 30 Oct 2015 18:52:27 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216172#M42562</guid>
      <dc:creator>edrivera3</dc:creator>
      <dc:date>2015-10-30T18:52:27Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove duplicate data and how to prevent having duplicate data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216173#M42563</link>
      <description>&lt;P&gt;What is the result of this search?&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;... | sort - _indextime | dedup source | fields _raw | format
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;It should have 1 field called &lt;CODE&gt;search&lt;/CODE&gt; that has a list of &lt;CODE&gt;OR&lt;/CODE&gt; on &lt;CODE&gt;_raw&lt;/CODE&gt;.&lt;/P&gt;</description>
      <pubDate>Fri, 30 Oct 2015 19:05:49 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216173#M42563</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2015-10-30T19:05:49Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove duplicate data and how to prevent having duplicate data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216174#M42564</link>
      <description>&lt;P&gt;This search provided all the right data. If I look in Statistics I have a table with one row:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;_raw                           search
                                    NOT ()
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 30 Oct 2015 19:11:26 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216174#M42564</guid>
      <dc:creator>edrivera3</dc:creator>
      <dc:date>2015-10-30T19:11:26Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove duplicate data and how to prevent having duplicate data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216175#M42565</link>
      <description>&lt;P&gt;That's all I've got.  Play around and see if you can make it work and update this Q&amp;amp;A with what you find.&lt;/P&gt;</description>
      <pubDate>Fri, 30 Oct 2015 19:24:23 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216175#M42565</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2015-10-30T19:24:23Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove duplicate data and how to prevent having duplicate data?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216176#M42566</link>
      <description>&lt;P&gt;This  solution comes basically from wookcook's idea.&lt;/P&gt;

&lt;P&gt;I eliminate the &lt;PRE&gt;&lt;CODE&gt;| sort - indextime&lt;/CODE&gt;&lt;/PRE&gt;&lt;CODE&gt;&lt;/CODE&gt;  because it wasn't running correctly for me.&lt;BR /&gt;
I replaced the _raw field for two fields in my data and the indextime field.&lt;/P&gt;

&lt;P&gt;&lt;PRE&gt;&lt;CODE&gt;... | eval indextime=strftime(_indextime,"%Y-%m-%d %H:%M:%S") | search index=* NOT [...| eval indextime=strftime(_indextime,"%Y-%m-%d %H:%M:%S") | dedup source | fields field1, field2 ,indextime]&lt;/CODE&gt;&lt;/PRE&gt;&lt;CODE&gt;&lt;/CODE&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 07:47:20 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/How-to-remove-duplicate-data-and-how-to-prevent-having-duplicate/m-p/216176#M42566</guid>
      <dc:creator>edrivera3</dc:creator>
      <dc:date>2020-09-29T07:47:20Z</dc:date>
    </item>
  </channel>
</rss>

