Splunk Search

email partial match

ritzz
Loves-to-Learn Lots

for my mail logs in JSON format, with my splunk query I created below table

mail frommail submail to
ABCaccount created for Aabc@a.com
ABCaccount created for Bbcd@a.com
ABCaccount created for Cefg@a.com

 

In my splunk query I apply dedup on "mail sub".  as you can see unique but very similar subject remains in table which I want to further become joined or considered as 1 row.

my ask: what are the possible  way that I can partially match table column values and they combined into 1 .? in matching logic if somehow we can use two columns for matching (mail from and mail sub)

 

mail frommail submail tocount
ABCaccount created for Aabc@a.com3

 

count 3 is coming on the basis of partial match in unique subject and mail from combined.

 

Labels (1)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

In my splunk query I apply dedup on "mail sub".  as you can see unique but very similar subject remains in table which I want to further become joined or considered as 1 row.

I have a slightly different reading of the OP's intention based on these sentences.  Do you mean you want to group by mail subject's similarity, such as "account created for *"?  If so, you must realize that "similar" is a highly subjective word.  Unless you spell out precise criteria to determine similarity, you must look for natural language processing tool rather than Splunk search.

Suppose my reading of your intention is correct, and that "account created for" is one such criterion for "similarity", your illustrated single-row output is still wrong.  Do you mean something like

mail frommail submail tocount
ABCaccount created for *A, B, C*

abc@a.com
bcd@a.com
efg@a.com

3

Not only that.  You also mentioned dedup mail sub alone.  That is quite counterproductive to accurate counting because you are asking for "count ... on the basis of partial match in unique subject and mail from combined."  At the very minimum, you must dedup on mail from and mail sub; you SHOULD probably also add mail to in that list for the count to make sense.  But I'll leave those decisions to you.

Now, to use "account created for *" as partial match.  There are many ways to do that.  Here is one

 

| rex field="mail sub" "(?<similarity>account created for)\s+(?<disimilarity>.+)"
| stats values(disimilarity) as disimilarity values("mail to") as "mail to" by "mail from" similarity
| eval similarity = similarity . " *" . mvjoin(disimilarity, ", ") . "*"
| fields - disimilarity

 

This will give you

mail fromsimilarity
mail to
ABCaccount created for *A, B, C*
abc@a.com
bcd@a.com
efg@a.com

Hope this helps.  Here is an emulation that you can play with and compare with real data

 

| makeresults format=csv data="mail from,	mail sub,	mail to
ABC,	account created for A,	abc@a.com
ABC,	account created for B,	bcd@a.com
ABC,	account created for C,	efg@a.com"
``` data emulation above ```

 

gcusello
SplunkTrust
SplunkTrust

Hi @ritzz,

did you tried to use stats:

<your-search>
| stats values(mail_sub) AS mail_sub values(mail_to) AS mail_to BY mail_from

the only probleme is that the lista in mail _sub and mail_to aren't aligned, because they are sorted in alphabetically order one by one.

if you want to have aligned values you have to combine them:

<your-search>
| eval mail=mail_sub." - ".mail_to
| stats values(mail) AS mail BY mail_from

Ciao.

Giuseppe

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you leveraging automation to its fullest potential in your threat detection strategy?Our upcoming Security ...

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...