Splunk Search

email partial match

ritzz
Loves-to-Learn Lots

for my mail logs in JSON format, with my splunk query I created below table

mail frommail submail to
ABCaccount created for Aabc@a.com
ABCaccount created for Bbcd@a.com
ABCaccount created for Cefg@a.com

 

In my splunk query I apply dedup on "mail sub".  as you can see unique but very similar subject remains in table which I want to further become joined or considered as 1 row.

my ask: what are the possible  way that I can partially match table column values and they combined into 1 .? in matching logic if somehow we can use two columns for matching (mail from and mail sub)

 

mail frommail submail tocount
ABCaccount created for Aabc@a.com3

 

count 3 is coming on the basis of partial match in unique subject and mail from combined.

 

Labels (1)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

In my splunk query I apply dedup on "mail sub".  as you can see unique but very similar subject remains in table which I want to further become joined or considered as 1 row.

I have a slightly different reading of the OP's intention based on these sentences.  Do you mean you want to group by mail subject's similarity, such as "account created for *"?  If so, you must realize that "similar" is a highly subjective word.  Unless you spell out precise criteria to determine similarity, you must look for natural language processing tool rather than Splunk search.

Suppose my reading of your intention is correct, and that "account created for" is one such criterion for "similarity", your illustrated single-row output is still wrong.  Do you mean something like

mail frommail submail tocount
ABCaccount created for *A, B, C*

abc@a.com
bcd@a.com
efg@a.com

3

Not only that.  You also mentioned dedup mail sub alone.  That is quite counterproductive to accurate counting because you are asking for "count ... on the basis of partial match in unique subject and mail from combined."  At the very minimum, you must dedup on mail from and mail sub; you SHOULD probably also add mail to in that list for the count to make sense.  But I'll leave those decisions to you.

Now, to use "account created for *" as partial match.  There are many ways to do that.  Here is one

 

| rex field="mail sub" "(?<similarity>account created for)\s+(?<disimilarity>.+)"
| stats values(disimilarity) as disimilarity values("mail to") as "mail to" by "mail from" similarity
| eval similarity = similarity . " *" . mvjoin(disimilarity, ", ") . "*"
| fields - disimilarity

 

This will give you

mail fromsimilarity
mail to
ABCaccount created for *A, B, C*
abc@a.com
bcd@a.com
efg@a.com

Hope this helps.  Here is an emulation that you can play with and compare with real data

 

| makeresults format=csv data="mail from,	mail sub,	mail to
ABC,	account created for A,	abc@a.com
ABC,	account created for B,	bcd@a.com
ABC,	account created for C,	efg@a.com"
``` data emulation above ```

 

gcusello
SplunkTrust
SplunkTrust

Hi @ritzz,

did you tried to use stats:

<your-search>
| stats values(mail_sub) AS mail_sub values(mail_to) AS mail_to BY mail_from

the only probleme is that the lista in mail _sub and mail_to aren't aligned, because they are sorted in alphabetically order one by one.

if you want to have aligned values you have to combine them:

<your-search>
| eval mail=mail_sub." - ".mail_to
| stats values(mail) AS mail BY mail_from

Ciao.

Giuseppe

0 Karma
Get Updates on the Splunk Community!

Splunk Observability for AI

Don’t miss out on an exciting Tech Talk on Splunk Observability for AI!Discover how Splunk’s agentic AI ...

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

Watch On Demand the Tech Talk on November 6 at 11AM PT, and empower your SOC to reach new heights! Duration: ...

Splunk Observability as Code: From Zero to Dashboard

For the details on what Self-Service Observability and Observability as Code is, we have some awesome content ...