<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How do I check for the existence of an event before indexing to avoid duplicate events? in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/How-do-I-check-for-the-existence-of-an-event-before-indexing-to/m-p/364562#M162971</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;

&lt;P&gt;I'm busy trying to find a way to ensure that duplicate records are not indexed.  So far all I've managed to do is find a search that will remove duplicate values once they have been indexed (and consumed license):&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index="index_name"
| eval eid=_cd
| search [ search index="index_name"
 | streamstats count by _raw
 | search count&amp;gt;1
 | eval eid=_cd
 | fields eid ]
| delete
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Is there any way in &lt;CODE&gt;transforms.conf&lt;/CODE&gt; or &lt;CODE&gt;props.conf&lt;/CODE&gt; to check for the existence of an event before deciding to index?&lt;/P&gt;

&lt;P&gt;Thank you and best regards,&lt;/P&gt;

&lt;P&gt;Andrew&lt;/P&gt;</description>
    <pubDate>Mon, 12 Feb 2018 06:35:07 GMT</pubDate>
    <dc:creator>andrewtrobec</dc:creator>
    <dc:date>2018-02-12T06:35:07Z</dc:date>
    <item>
      <title>How do I check for the existence of an event before indexing to avoid duplicate events?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-do-I-check-for-the-existence-of-an-event-before-indexing-to/m-p/364562#M162971</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;

&lt;P&gt;I'm busy trying to find a way to ensure that duplicate records are not indexed.  So far all I've managed to do is find a search that will remove duplicate values once they have been indexed (and consumed license):&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index="index_name"
| eval eid=_cd
| search [ search index="index_name"
 | streamstats count by _raw
 | search count&amp;gt;1
 | eval eid=_cd
 | fields eid ]
| delete
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Is there any way in &lt;CODE&gt;transforms.conf&lt;/CODE&gt; or &lt;CODE&gt;props.conf&lt;/CODE&gt; to check for the existence of an event before deciding to index?&lt;/P&gt;

&lt;P&gt;Thank you and best regards,&lt;/P&gt;

&lt;P&gt;Andrew&lt;/P&gt;</description>
      <pubDate>Mon, 12 Feb 2018 06:35:07 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-do-I-check-for-the-existence-of-an-event-before-indexing-to/m-p/364562#M162971</guid>
      <dc:creator>andrewtrobec</dc:creator>
      <dc:date>2018-02-12T06:35:07Z</dc:date>
    </item>
    <item>
      <title>Re: How do I check for the existence of an event before indexing to avoid duplicate events?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-do-I-check-for-the-existence-of-an-event-before-indexing-to/m-p/364563#M162972</link>
      <description>&lt;P&gt;Hi andrewtrobec,&lt;BR /&gt;
No there isn't any way to configure Splunk for this, Splunk already check if it already indexed a file (fishbuckets), but if you have the same event in two different files, you index it twice!&lt;/P&gt;

&lt;P&gt;The only way that I can think (but I didn't tried to do this!) is, using SDK, to perform a search to check an event before indexing, but, as you can think, this make very slow the ingestion process, in addition what's the time period in your check search? one minute, one hour or more? there's the high risk to overload your system so the cost of the oversetting is lower than license!&lt;/P&gt;

&lt;P&gt;Also the way to delete redundant logs it's dangerous because you risk to delete good events! probably it' should be better to dedup results at search time; remember that using "delete" command you don't save disk space because it's a logical deletion, not physical!&lt;/P&gt;

&lt;P&gt;I think that you should check at first what's the license overload that probably it isn't so high, after you should try to re-engineer your inputs.&lt;/P&gt;

&lt;P&gt;Bye.&lt;BR /&gt;
Giuseppe&lt;/P&gt;</description>
      <pubDate>Mon, 12 Feb 2018 10:13:19 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-do-I-check-for-the-existence-of-an-event-before-indexing-to/m-p/364563#M162972</guid>
      <dc:creator>gcusello</dc:creator>
      <dc:date>2018-02-12T10:13:19Z</dc:date>
    </item>
    <item>
      <title>Re: How do I check for the existence of an event before indexing to avoid duplicate events?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-do-I-check-for-the-existence-of-an-event-before-indexing-to/m-p/364564#M162973</link>
      <description>&lt;P&gt;@cusello Thanks for the information, very useful.  Is there a way to physically delete logically deleted events?&lt;/P&gt;</description>
      <pubDate>Mon, 12 Feb 2018 10:46:53 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-do-I-check-for-the-existence-of-an-event-before-indexing-to/m-p/364564#M162973</guid>
      <dc:creator>andrewtrobec</dc:creator>
      <dc:date>2018-02-12T10:46:53Z</dc:date>
    </item>
    <item>
      <title>Re: How do I check for the existence of an event before indexing to avoid duplicate events?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-do-I-check-for-the-existence-of-an-event-before-indexing-to/m-p/364565#M162974</link>
      <description>&lt;P&gt;Hi andrewtrobec,&lt;BR /&gt;
No, for my knowledge, the only way to physically delete events from an index is the "splunk clean eventdata -index index_name" command but in this way you delete the full index.&lt;/P&gt;

&lt;P&gt;You have to wait for the retention time!&lt;BR /&gt;
For this reason the delete command isn't a good way to delete, it's better to maintain events and dedup them at serach time!&lt;/P&gt;

&lt;P&gt;Bye.&lt;BR /&gt;
Giuseppe&lt;/P&gt;</description>
      <pubDate>Mon, 12 Feb 2018 11:25:25 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-do-I-check-for-the-existence-of-an-event-before-indexing-to/m-p/364565#M162974</guid>
      <dc:creator>gcusello</dc:creator>
      <dc:date>2018-02-12T11:25:25Z</dc:date>
    </item>
  </channel>
</rss>

