<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How do I deduplicate events with such conditions? in Splunk Dev</title>
    <link>https://community.splunk.com/t5/Splunk-Dev/How-do-I-deduplicate-events-with-such-conditions/m-p/288255#M3576</link>
    <description>&lt;P&gt;How about you just do dedup on host??&lt;/P&gt;</description>
    <pubDate>Wed, 23 Aug 2017 20:53:16 GMT</pubDate>
    <dc:creator>somesoni2</dc:creator>
    <dc:date>2017-08-23T20:53:16Z</dc:date>
    <item>
      <title>How do I deduplicate events with such conditions?</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/How-do-I-deduplicate-events-with-such-conditions/m-p/288253#M3574</link>
      <description>&lt;P&gt;So I got multiple custom datasources, scripts mainly, which are sending events to Splunk on some schedule/recurrence.&lt;BR /&gt;
I can distinguish every execution of these sources by either a timestamp, or a custom ID, which gets incremented with every execution which is captured in every event. The events always have a proper host field, which also contributes to the "unique key" of an event with unique ID mentioned beforehand. The hosts are attributed with custom fields, this is the third part of something which could be used as uniqe key. These are always present in the events as long as they apply to a given host, and are no longer present when they don't apply to a host.&lt;/P&gt;

&lt;P&gt;An example what I mean (every line is a separate event):&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;hostID=host1, attributeID=attribute1, customid=customid1&lt;/LI&gt;
&lt;LI&gt;hostID=host1, attributeID=attribute2, customid=customid1&lt;/LI&gt;
&lt;LI&gt;hostID=host2, attributeID=attribute1, customid=customid2&lt;/LI&gt;
&lt;LI&gt;hostID=host1, attributeID=attribute1, customid=customid2&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;(Because of the _time field, these would appear in Splunk in reverse order obviously)&lt;/P&gt;

&lt;P&gt;I want to deduplicate such events to always have the data only from the really last execution of a script. Like, from the above example, I want to have only&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;host2, attribute1, customid2&lt;/LI&gt;
&lt;LI&gt;host1, attribute1, customid2&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;If I were to use &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| dedup hostID, attributeID, customid
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;It would yield me&lt;BR /&gt;
- host1, attribute2, customid1&lt;BR /&gt;
- host2, attribute1, customid2&lt;BR /&gt;
- host1, attribute1, customid2&lt;/P&gt;

&lt;P&gt;The solution my team came up is using&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;&amp;lt;base search&amp;gt; | eventstats max(customid) as max_customid by hostID | search customid=max_customid
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;This pretty much does the thing, but I feel this is really not efficient - what would be the right approach do to this?&lt;/P&gt;

&lt;H1&gt;===EDIT&lt;/H1&gt;

&lt;P&gt;One given host has multiple events (with multiple attributes) from the same execution of the script.&lt;BR /&gt;
A more detailed example, let's say I got these events:&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;hostID=host1, attributeID=attribute1, customid=customid1&lt;/LI&gt;
&lt;LI&gt;hostID=host1, attributeID=attribute2, customid=customid1&lt;/LI&gt;
&lt;LI&gt;hostID=host2, attributeID=attribute1, customid=customid2&lt;/LI&gt;
&lt;LI&gt;hostID=host1, attributeID=attribute1, customid=customid2&lt;/LI&gt;
&lt;LI&gt;hostID=host1, attributeID=attribute3, customid=customid2&lt;/LI&gt;
&lt;LI&gt;hostID=host1, attributeID=attribute4, customid=customid2&lt;/LI&gt;
&lt;LI&gt;hostID=host2, attributeID=attribute3, customid=customid2&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;I want to keep the below events:&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;hostID=host2, attributeID=attribute1, customid=customid2&lt;/LI&gt;
&lt;LI&gt;hostID=host1, attributeID=attribute1, customid=customid2&lt;/LI&gt;
&lt;LI&gt;hostID=host1, attributeID=attribute3, customid=customid2&lt;/LI&gt;
&lt;LI&gt;hostID=host1, attributeID=attribute4, customid=customid2&lt;/LI&gt;
&lt;LI&gt;hostID=host2, attributeID=attribute3, customid=customid2&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;This is the reason I can't use stats first()&lt;/P&gt;</description>
      <pubDate>Wed, 23 Aug 2017 20:12:07 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/How-do-I-deduplicate-events-with-such-conditions/m-p/288253#M3574</guid>
      <dc:creator>szabados</dc:creator>
      <dc:date>2017-08-23T20:12:07Z</dc:date>
    </item>
    <item>
      <title>Re: How do I deduplicate events with such conditions?</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/How-do-I-deduplicate-events-with-such-conditions/m-p/288254#M3575</link>
      <description>&lt;P&gt;Have you tried the "first" function with the stats command:  &lt;CODE&gt;&amp;lt;base search&amp;gt; | eval myKey=attributeID.customID | stats first(myKey) by hostID&lt;/CODE&gt; &lt;/P&gt;</description>
      <pubDate>Wed, 23 Aug 2017 20:49:16 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/How-do-I-deduplicate-events-with-such-conditions/m-p/288254#M3575</guid>
      <dc:creator>s2_splunk</dc:creator>
      <dc:date>2017-08-23T20:49:16Z</dc:date>
    </item>
    <item>
      <title>Re: How do I deduplicate events with such conditions?</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/How-do-I-deduplicate-events-with-such-conditions/m-p/288255#M3576</link>
      <description>&lt;P&gt;How about you just do dedup on host??&lt;/P&gt;</description>
      <pubDate>Wed, 23 Aug 2017 20:53:16 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/How-do-I-deduplicate-events-with-such-conditions/m-p/288255#M3576</guid>
      <dc:creator>somesoni2</dc:creator>
      <dc:date>2017-08-23T20:53:16Z</dc:date>
    </item>
    <item>
      <title>Re: How do I deduplicate events with such conditions?</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/How-do-I-deduplicate-events-with-such-conditions/m-p/288256#M3577</link>
      <description>&lt;P&gt;Unfortunately not what I need, please see me update on the original post above.&lt;/P&gt;</description>
      <pubDate>Thu, 24 Aug 2017 07:29:40 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/How-do-I-deduplicate-events-with-such-conditions/m-p/288256#M3577</guid>
      <dc:creator>szabados</dc:creator>
      <dc:date>2017-08-24T07:29:40Z</dc:date>
    </item>
    <item>
      <title>Re: How do I deduplicate events with such conditions?</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/How-do-I-deduplicate-events-with-such-conditions/m-p/288257#M3578</link>
      <description>&lt;PRE&gt;&lt;CODE&gt;&amp;lt;base search&amp;gt; | eval myKey=hostID.attributeID.customID | dedup myKey
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Should do what you want. Dedup keeps the youngest event that matches the combined key.&lt;/P&gt;</description>
      <pubDate>Thu, 24 Aug 2017 08:30:04 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/How-do-I-deduplicate-events-with-such-conditions/m-p/288257#M3578</guid>
      <dc:creator>s2_splunk</dc:creator>
      <dc:date>2017-08-24T08:30:04Z</dc:date>
    </item>
    <item>
      <title>Re: How do I deduplicate events with such conditions?</title>
      <link>https://community.splunk.com/t5/Splunk-Dev/How-do-I-deduplicate-events-with-such-conditions/m-p/288258#M3579</link>
      <description>&lt;P&gt;Let's baseline.  These stats pairs are similar: &lt;CODE&gt;first&lt;/CODE&gt;/&lt;CODE&gt;last&lt;/CODE&gt;, &lt;CODE&gt;earliest&lt;/CODE&gt;/&lt;CODE&gt;latest&lt;/CODE&gt;, &lt;CODE&gt;min&lt;/CODE&gt;/&lt;CODE&gt;max&lt;/CODE&gt;.   The last pair I think are obvious but the first pair are not the same as the second pair, which is what may people assume at first.  If your events have not been resorted, they should (and this is a big "should" because sometimes Splunk fails to do this and doesn't always generate a warning) come back to you sorted in "newest to latest" order with newest on top.  In such a case, &lt;CODE&gt;first&lt;/CODE&gt; does the same thing as &lt;CODE&gt;latest&lt;/CODE&gt;.  Let that sink in: &lt;CODE&gt;first&lt;/CODE&gt; DOES NOT do the same thing as &lt;CODE&gt;earliest&lt;/CODE&gt;; it does the OPPOSITE.  That is because what &lt;CODE&gt;first&lt;/CODE&gt; actually does is walk backwards through your events from the top (which by default should be the "latest" event) and grab the "first" one that it sees.&lt;/P&gt;

&lt;P&gt;OK, so for your case, simply sort your events the way that you desire (you can have multiple layers of sort by using more than 1 field argument) and then use &lt;CODE&gt;first&lt;/CODE&gt; or &lt;CODE&gt;dedup&lt;/CODE&gt;.&lt;/P&gt;

&lt;P&gt;Pro tip: be sure that you use &lt;CODE&gt;sort 0&lt;/CODE&gt;, not just &lt;CODE&gt;sort&lt;/CODE&gt;.&lt;/P&gt;</description>
      <pubDate>Fri, 25 Aug 2017 15:21:29 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Dev/How-do-I-deduplicate-events-with-such-conditions/m-p/288258#M3579</guid>
      <dc:creator>woodcock</dc:creator>
      <dc:date>2017-08-25T15:21:29Z</dc:date>
    </item>
  </channel>
</rss>

