Deployment Architecture

How to use Splunk to find slight variations in email message_subjects and file_names?

packet_hunter
Contributor

I am hoping to find a way to sift thru loads of emails to find emails with similar subjects or similar attachment names.

Currently I might search by subject or attachment name.

For example,

index=mail sourcetype="mail" 
    [search index=mail sourcetype="mail" message_subject = *<something>*  |stats count by internal_message_id | fields internal_message_id]
    |eval Time=strftime(_time, "%H:%M:%S") | eval Date=strftime(_time, "%A %F") 
    |stats list(message_subject) as subj list(sender) as sender list(recipient) as recp list(file_name) as AttachmentName list(attachment_type) as AttachmentType list(vendor_action) as status values(Time) as Time values(Date) as Date by internal_message_id 

or

 index=mail sourcetype="mail" 
        [search index=mail sourcetype="mail" file_name = *<something>*  |stats count by internal_message_id | fields internal_message_id]
        |eval Time=strftime(_time, "%H:%M:%S") | eval Date=strftime(_time, "%A %F") 
        |stats list(message_subject) as subj list(sender) as sender list(recipient) as recp list(file_name) as AttachmentName list(attachment_type) as AttachmentType list(vendor_action) as status values(Time) as Time values(Date) as Date by internal_message_id 

I am looking to find all variations or patterns of similar emails...
for example
subj = Order-008796, Order-008948, Order-009485, etc.
AttachmentName = Order#00879, Order-008948, Order#009485, etc (extns like .doc are already parsed out natively in the log)

Whats the best way to find similar patterns? Cluster? Any other ideas?

Thank you

0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

There are a few ways to do that, depending on the patterns you want to match. One is to use wildcards in the base search

index=mail sourcetype="mail" message_subject ="Order-*" | ...

or use like

index=mail sourcetype="mail"  | where like(message_subject,"Order-%") | ...

or use regex

index=mail sourcetype="mail" | regex message_subject = "Order-\d{6}" | ...
---
If this reply helps you, Karma would be appreciated.

View solution in original post

0 Karma

richgalloway
SplunkTrust
SplunkTrust

There are a few ways to do that, depending on the patterns you want to match. One is to use wildcards in the base search

index=mail sourcetype="mail" message_subject ="Order-*" | ...

or use like

index=mail sourcetype="mail"  | where like(message_subject,"Order-%") | ...

or use regex

index=mail sourcetype="mail" | regex message_subject = "Order-\d{6}" | ...
---
If this reply helps you, Karma would be appreciated.
0 Karma

packet_hunter
Contributor

Thank you Rich. Before I accept your answer, just wanted to get your opinion on using cluster. When would you typically use cluster?

Thank you

0 Karma

richgalloway
SplunkTrust
SplunkTrust

I haven't used the cluster command, but it could apply in this case. I wonder what you'd get from index=mail sourcetype="mail" | cluster field=message_subject | ...

---
If this reply helps you, Karma would be appreciated.
0 Karma

packet_hunter
Contributor

Thanks for the reply, I was thinking about cluster as more of an automatic check with less manual changes to the query.

I will experiment a bit, and post a new question in a while.

Thank you

0 Karma
Get Updates on the Splunk Community!

.conf25 Registration is OPEN!

Ready. Set. Splunk! Your favorite Splunk user event is back and better than ever. Get ready for more technical ...

Detecting Cross-Channel Fraud with Splunk

This article is the final installment in our three-part series exploring fraud detection techniques using ...

Splunk at Cisco Live 2025: Learning, Innovation, and a Little Bit of Mr. Brightside

Pack your bags (and maybe your dancing shoes)—Cisco Live is heading to San Diego, June 8–12, 2025, and Splunk ...