<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: dedup, distinct, similar in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/dedup-distinct-similar/m-p/71149#M180794</link>
    <description>&lt;P&gt;Looks like Sorkin also addressed this in:&lt;/P&gt;

&lt;P&gt;&lt;A href="http://answers.splunk.com/questions/12396/dedup-and-multivalued-fields"&gt;http://answers.splunk.com/questions/12396/dedup-and-multivalued-fields&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 31 Mar 2011 00:37:30 GMT</pubDate>
    <dc:creator>hazekamp</dc:creator>
    <dc:date>2011-03-31T00:37:30Z</dc:date>
    <item>
      <title>dedup, distinct, similar</title>
      <link>https://community.splunk.com/t5/Splunk-Search/dedup-distinct-similar/m-p/71147#M180792</link>
      <description>&lt;P&gt;All,&lt;/P&gt;

&lt;P&gt;I am trying to remove duplicate values in a list of email addresses.
First, I am loading this from a CSV, inside that CSV is a semi-colon delimited list of email recipients. In that list, some of the email recipients are duplicated.  Obviously, they only received the email one time, and want that metric. &lt;/P&gt;

&lt;P&gt;So, I've put together a query to start that looks like this:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;sourcetype="email" sender="jgauthier*" subject="Specific email" | eval recipientlist=split(recipient, ";")
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;So, my recipientlist now contains the duplicate email addresses. (from recipient, but split up)&lt;/P&gt;

&lt;P&gt;For one email, it appears that it's gone to the same person twice (since they are in the list twice).&lt;/P&gt;

&lt;P&gt;I've played with dedup, but it doesn't seem to work in this regard.&lt;/P&gt;

&lt;P&gt;Thanks for the help.&lt;/P&gt;</description>
      <pubDate>Thu, 31 Mar 2011 00:25:44 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/dedup-distinct-similar/m-p/71147#M180792</guid>
      <dc:creator>jgauthier</dc:creator>
      <dc:date>2011-03-31T00:25:44Z</dc:date>
    </item>
    <item>
      <title>Re: dedup, distinct, similar</title>
      <link>https://community.splunk.com/t5/Splunk-Search/dedup-distinct-similar/m-p/71148#M180793</link>
      <description>&lt;P&gt;This is likely caused by the way dedup is behaving with the MV field created by eval-split.  Try:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;sourcetype="email" sender="jgauthier*" subject="Specific email" | eval recipientlist=split(recipient, ";") | stats count by recipientlist | fields - count
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;If you want the "count" just remove "| fields - count" above...&lt;/P&gt;</description>
      <pubDate>Thu, 31 Mar 2011 00:32:17 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/dedup-distinct-similar/m-p/71148#M180793</guid>
      <dc:creator>hazekamp</dc:creator>
      <dc:date>2011-03-31T00:32:17Z</dc:date>
    </item>
    <item>
      <title>Re: dedup, distinct, similar</title>
      <link>https://community.splunk.com/t5/Splunk-Search/dedup-distinct-similar/m-p/71149#M180794</link>
      <description>&lt;P&gt;Looks like Sorkin also addressed this in:&lt;/P&gt;

&lt;P&gt;&lt;A href="http://answers.splunk.com/questions/12396/dedup-and-multivalued-fields"&gt;http://answers.splunk.com/questions/12396/dedup-and-multivalued-fields&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 31 Mar 2011 00:37:30 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/dedup-distinct-similar/m-p/71149#M180794</guid>
      <dc:creator>hazekamp</dc:creator>
      <dc:date>2011-03-31T00:37:30Z</dc:date>
    </item>
    <item>
      <title>Re: dedup, distinct, similar</title>
      <link>https://community.splunk.com/t5/Splunk-Search/dedup-distinct-similar/m-p/71150#M180795</link>
      <description>&lt;P&gt;I saw Sorkin's thread before I posted and worked with it.&lt;BR /&gt;
But neither that effort or the one above seemed to accomplish this.  Simply doing "stats count by recipientlist" gives me a count of &lt;X&gt; when the string exists multiple times in the recipientlist.  "| fields - count" just gives me the field.  Even if I have 100 distinct events, with a field containing duplicate values.&lt;BR /&gt;
Thanks!&lt;/X&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 31 Mar 2011 00:46:57 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/dedup-distinct-similar/m-p/71150#M180795</guid>
      <dc:creator>jgauthier</dc:creator>
      <dc:date>2011-03-31T00:46:57Z</dc:date>
    </item>
    <item>
      <title>Re: dedup, distinct, similar</title>
      <link>https://community.splunk.com/t5/Splunk-Search/dedup-distinct-similar/m-p/71151#M180796</link>
      <description>&lt;P&gt;I am not sure why stats is not satisfying the use case of "trying to remove duplicate values...".  I tested this locally and if you have:&lt;/P&gt;

&lt;P&gt;Event 1:&lt;BR /&gt;
recipient=a;b;c;c&lt;/P&gt;

&lt;P&gt;Event 2:&lt;BR /&gt;
recipient=x;y;z;a&lt;/P&gt;

&lt;P&gt;This search:&lt;BR /&gt;
index=_internal | head 1 | eval recipient="a;b;c;c" | append [search index=_internal | head 1 | eval recipient="x;y;z;a"] | eval recipientList=split(recipient, ";") | stats count by recipientList&lt;/P&gt;

&lt;P&gt;Produces:&lt;/P&gt;

&lt;P&gt;a   2&lt;BR /&gt;
b   1&lt;BR /&gt;
c   2&lt;BR /&gt;
x   1&lt;BR /&gt;
y   1&lt;BR /&gt;
z   1&lt;/P&gt;

&lt;P&gt;Which is a consolidate list of email addresses.&lt;/P&gt;</description>
      <pubDate>Mon, 28 Sep 2020 09:26:52 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/dedup-distinct-similar/m-p/71151#M180796</guid>
      <dc:creator>hazekamp</dc:creator>
      <dc:date>2020-09-28T09:26:52Z</dc:date>
    </item>
  </channel>
</rss>

