<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: email partial match in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/email-partial-match/m-p/665711#M228391</link>
    <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;In my splunk query I apply dedup on "mail sub".&amp;nbsp; as you can see unique but very similar subject remains in table which I want to further become joined or considered as 1 row.&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;I have a slightly different reading of the OP's intention based on these sentences. &amp;nbsp;Do you mean you want to group by mail subject's similarity, such as "account created for *"? &amp;nbsp;If so, you must realize that "&lt;EM&gt;similar&lt;/EM&gt;" is a highly subjective word. &amp;nbsp;Unless you spell out precise criteria to determine similarity, you must look for natural language processing tool rather than Splunk search.&lt;/P&gt;&lt;P&gt;Suppose my reading of your intention is correct, and that "account created for" is one such criterion for "similarity", your illustrated single-row output is still wrong. &amp;nbsp;Do you mean something like&lt;/P&gt;&lt;TABLE border="1" width="589.5px"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="196.047px" height="25px"&gt;mail from&lt;/TD&gt;&lt;TD width="196.117px" height="25px"&gt;mail sub&lt;/TD&gt;&lt;TD width="98.2422px" height="25px"&gt;mail to&lt;/TD&gt;&lt;TD width="98.0938px"&gt;count&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="196.047px" height="25px"&gt;ABC&lt;/TD&gt;&lt;TD width="196.117px" height="25px"&gt;account created for *A, B, C*&lt;/TD&gt;&lt;TD width="98.2422px" height="25px"&gt;&lt;P&gt;abc@a.com&lt;BR /&gt;bcd@a.com&lt;BR /&gt;efg@a.com&lt;/P&gt;&lt;/TD&gt;&lt;TD width="98.0938px"&gt;3&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;Not only that. &amp;nbsp;You also mentioned dedup mail sub alone. &amp;nbsp;That is quite counterproductive to accurate counting because you are asking for "&lt;SPAN&gt;count ... on the basis of partial match in unique subject and mail from combined." &amp;nbsp;At the very minimum, you must dedup on mail from and mail sub; you SHOULD probably also add mail to in that list for the count to make sense.&lt;/SPAN&gt;&amp;nbsp; But I'll leave those decisions to you.&lt;/P&gt;&lt;P&gt;Now, to use&amp;nbsp;"account created for *" as partial match. &amp;nbsp;There are many ways to do that. &amp;nbsp;Here is one&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| rex field="mail sub" "(?&amp;lt;similarity&amp;gt;account created for)\s+(?&amp;lt;disimilarity&amp;gt;.+)"
| stats values(disimilarity) as disimilarity values("mail to") as "mail to" by "mail from" similarity
| eval similarity = similarity . " *" . mvjoin(disimilarity, ", ") . "*"
| fields - disimilarity&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This will give you&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;mail from&lt;/TD&gt;&lt;TD&gt;similarity&lt;/TD&gt;&lt;TD&gt;&lt;DIV class=""&gt;mail to&lt;/DIV&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;ABC&lt;/TD&gt;&lt;TD&gt;account created for *A, B, C*&lt;/TD&gt;&lt;TD&gt;&lt;DIV class=""&gt;abc@a.com&lt;/DIV&gt;&lt;DIV class=""&gt;bcd@a.com&lt;/DIV&gt;&lt;DIV class=""&gt;efg@a.com&lt;/DIV&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;Hope this helps. &amp;nbsp;Here is an emulation that you can play with and compare with real data&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| makeresults format=csv data="mail from,	mail sub,	mail to
ABC,	account created for A,	abc@a.com
ABC,	account created for B,	bcd@a.com
ABC,	account created for C,	efg@a.com"
``` data emulation above ```&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 20 Oct 2023 18:04:02 GMT</pubDate>
    <dc:creator>yuanliu</dc:creator>
    <dc:date>2023-10-20T18:04:02Z</dc:date>
    <item>
      <title>email partial match</title>
      <link>https://community.splunk.com/t5/Splunk-Search/email-partial-match/m-p/665665#M228380</link>
      <description>&lt;P&gt;for my mail logs in JSON format, with my splunk query I created below table&lt;/P&gt;&lt;TABLE border="1" width="56.25%"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="25%" height="25px"&gt;mail from&lt;/TD&gt;&lt;TD width="25%" height="25px"&gt;mail sub&lt;/TD&gt;&lt;TD width="25%" height="25px"&gt;mail to&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="25%" height="25px"&gt;ABC&lt;/TD&gt;&lt;TD width="25%" height="25px"&gt;account created for A&lt;/TD&gt;&lt;TD width="25%" height="25px"&gt;abc@a.com&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="25%" height="25px"&gt;ABC&lt;/TD&gt;&lt;TD width="25%" height="25px"&gt;account created for B&lt;/TD&gt;&lt;TD width="25%" height="25px"&gt;bcd@a.com&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="25%" height="25px"&gt;ABC&lt;/TD&gt;&lt;TD width="25%" height="25px"&gt;account created for C&lt;/TD&gt;&lt;TD width="25%" height="25px"&gt;efg@a.com&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In my splunk query I apply dedup on "mail sub".&amp;nbsp; as you can see unique but very similar subject remains in table which I want to further become joined or considered as 1 row.&lt;/P&gt;&lt;P&gt;my ask: what are the possible&amp;nbsp; way that I can partially match table column values and they combined into 1 .? in matching logic if somehow we can use two columns for matching (mail from and mail sub)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;TABLE border="1" width="589.5px"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="196.047px" height="25px"&gt;mail from&lt;/TD&gt;&lt;TD width="196.117px" height="25px"&gt;mail sub&lt;/TD&gt;&lt;TD width="98.2422px" height="25px"&gt;mail to&lt;/TD&gt;&lt;TD width="98.0938px"&gt;count&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="196.047px" height="25px"&gt;ABC&lt;/TD&gt;&lt;TD width="196.117px" height="25px"&gt;account created for A&lt;/TD&gt;&lt;TD width="98.2422px" height="25px"&gt;abc@a.com&lt;/TD&gt;&lt;TD width="98.0938px"&gt;3&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;count 3 is coming on the basis of partial match in unique subject and mail from combined.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 20 Oct 2023 13:04:22 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/email-partial-match/m-p/665665#M228380</guid>
      <dc:creator>ritzz</dc:creator>
      <dc:date>2023-10-20T13:04:22Z</dc:date>
    </item>
    <item>
      <title>Re: email partial match</title>
      <link>https://community.splunk.com/t5/Splunk-Search/email-partial-match/m-p/665667#M228382</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/261203"&gt;@ritzz&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;did you tried to use stats:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;&amp;lt;your-search&amp;gt;
| stats values(mail_sub) AS mail_sub values(mail_to) AS mail_to BY mail_from&lt;/LI-CODE&gt;&lt;P&gt;the only probleme is that the lista in mail _sub and mail_to aren't aligned, because they are sorted in alphabetically order one by one.&lt;/P&gt;&lt;P&gt;if you want to have aligned values you have to combine them:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;&amp;lt;your-search&amp;gt;
| eval mail=mail_sub." - ".mail_to
| stats values(mail) AS mail BY mail_from&lt;/LI-CODE&gt;&lt;P&gt;Ciao.&lt;/P&gt;&lt;P&gt;Giuseppe&lt;/P&gt;</description>
      <pubDate>Fri, 20 Oct 2023 10:36:54 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/email-partial-match/m-p/665667#M228382</guid>
      <dc:creator>gcusello</dc:creator>
      <dc:date>2023-10-20T10:36:54Z</dc:date>
    </item>
    <item>
      <title>Re: email partial match</title>
      <link>https://community.splunk.com/t5/Splunk-Search/email-partial-match/m-p/665711#M228391</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;In my splunk query I apply dedup on "mail sub".&amp;nbsp; as you can see unique but very similar subject remains in table which I want to further become joined or considered as 1 row.&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;I have a slightly different reading of the OP's intention based on these sentences. &amp;nbsp;Do you mean you want to group by mail subject's similarity, such as "account created for *"? &amp;nbsp;If so, you must realize that "&lt;EM&gt;similar&lt;/EM&gt;" is a highly subjective word. &amp;nbsp;Unless you spell out precise criteria to determine similarity, you must look for natural language processing tool rather than Splunk search.&lt;/P&gt;&lt;P&gt;Suppose my reading of your intention is correct, and that "account created for" is one such criterion for "similarity", your illustrated single-row output is still wrong. &amp;nbsp;Do you mean something like&lt;/P&gt;&lt;TABLE border="1" width="589.5px"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="196.047px" height="25px"&gt;mail from&lt;/TD&gt;&lt;TD width="196.117px" height="25px"&gt;mail sub&lt;/TD&gt;&lt;TD width="98.2422px" height="25px"&gt;mail to&lt;/TD&gt;&lt;TD width="98.0938px"&gt;count&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="196.047px" height="25px"&gt;ABC&lt;/TD&gt;&lt;TD width="196.117px" height="25px"&gt;account created for *A, B, C*&lt;/TD&gt;&lt;TD width="98.2422px" height="25px"&gt;&lt;P&gt;abc@a.com&lt;BR /&gt;bcd@a.com&lt;BR /&gt;efg@a.com&lt;/P&gt;&lt;/TD&gt;&lt;TD width="98.0938px"&gt;3&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;Not only that. &amp;nbsp;You also mentioned dedup mail sub alone. &amp;nbsp;That is quite counterproductive to accurate counting because you are asking for "&lt;SPAN&gt;count ... on the basis of partial match in unique subject and mail from combined." &amp;nbsp;At the very minimum, you must dedup on mail from and mail sub; you SHOULD probably also add mail to in that list for the count to make sense.&lt;/SPAN&gt;&amp;nbsp; But I'll leave those decisions to you.&lt;/P&gt;&lt;P&gt;Now, to use&amp;nbsp;"account created for *" as partial match. &amp;nbsp;There are many ways to do that. &amp;nbsp;Here is one&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| rex field="mail sub" "(?&amp;lt;similarity&amp;gt;account created for)\s+(?&amp;lt;disimilarity&amp;gt;.+)"
| stats values(disimilarity) as disimilarity values("mail to") as "mail to" by "mail from" similarity
| eval similarity = similarity . " *" . mvjoin(disimilarity, ", ") . "*"
| fields - disimilarity&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This will give you&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;mail from&lt;/TD&gt;&lt;TD&gt;similarity&lt;/TD&gt;&lt;TD&gt;&lt;DIV class=""&gt;mail to&lt;/DIV&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;ABC&lt;/TD&gt;&lt;TD&gt;account created for *A, B, C*&lt;/TD&gt;&lt;TD&gt;&lt;DIV class=""&gt;abc@a.com&lt;/DIV&gt;&lt;DIV class=""&gt;bcd@a.com&lt;/DIV&gt;&lt;DIV class=""&gt;efg@a.com&lt;/DIV&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;Hope this helps. &amp;nbsp;Here is an emulation that you can play with and compare with real data&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| makeresults format=csv data="mail from,	mail sub,	mail to
ABC,	account created for A,	abc@a.com
ABC,	account created for B,	bcd@a.com
ABC,	account created for C,	efg@a.com"
``` data emulation above ```&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 20 Oct 2023 18:04:02 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/email-partial-match/m-p/665711#M228391</guid>
      <dc:creator>yuanliu</dc:creator>
      <dc:date>2023-10-20T18:04:02Z</dc:date>
    </item>
  </channel>
</rss>

