hi want to compare the email header and count by dest_port =25. (Im trying to detect a phishing email via email title)
if the email header has the same title appears twice, I will return the number of count by dest_port= 25
source=* dest_port=25
| rex field=src_content max_match=0 "(?PSubject: Fw: Order Inquiry)"
| eval count=mvcount(occurredSubject)
| stats sum(count) as totalOccurrence
but it doesn't work. any help ?
Assuming you have 1 email message per event:
Extract the subject, as already demonstrated by @auraria1 and do a count by subject and then filter for counts bigger than 1.
source=* dest_port=25
| rex field=_raw "Subject\:\s(?<subject>.+)"
| stats count by subject
| where count>1
If you want to retrieve the entire event, for those events that have subjects occuring more than once, then use eventstats instead of stats:
source=* dest_port=25
| rex field=_raw "Subject\:\s(?<subject>.+)"
| eventstats count by subject
| where count>1
Assuming you have 1 email message per event:
Extract the subject, as already demonstrated by @auraria1 and do a count by subject and then filter for counts bigger than 1.
source=* dest_port=25
| rex field=_raw "Subject\:\s(?<subject>.+)"
| stats count by subject
| where count>1
If you want to retrieve the entire event, for those events that have subjects occuring more than once, then use eventstats instead of stats:
source=* dest_port=25
| rex field=_raw "Subject\:\s(?<subject>.+)"
| eventstats count by subject
| where count>1
@FrankVI @auraria1 @nittala_surya, Thank you so much for the answer !! I really appreciate it ! It worked !
Hello @weicheng98,
Is it possible to provide some sample events. I think there might be a mistake in your rex statement.
sample event from src_content:
MAIL FROM:
RCPT TO:
DATA
Date: Mon, 12 Mar 2018 15:47:20
From: Alice
User-Agent: Mozilla/5.0
To:Bob.@here.com
Subject: Fw: Order Inquiry
Content-Type: multipart/mixed;
Dear Alice
blah blah blah
Another sample event
MAIL FROM:<>
RCPT TO:
DATA
Received: from htgz ([131.131.131.131])
Message-ID: 20081229155033.5070401@rllss.com
Date: Mon, 29 Dec 2008 15:50:33 -0500
From: "Alice"
User-Agent: Thunderbird
To: chapman@progress1.com
Subject: Xmas of pleasure for your couple!
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
you have problems with your account
Is subject in it's own field? if not this makes it a bit more difficult.
You can create a subject field using the following:
| rex field=_raw "Subject:\s(?.*)Content-Type" | stats count by Subject | sort - count
If so it'll make searching wayyyyyy easier, you can add this to a field extraction so this is done by splunk.
In regards to your other question, are you specifically looking for only emails with fw: Order Inquiry as a subject to compare number of emails coming in? Or all subjects?
Hi,as you can see from my sample events, the src content contains these stream of data so that’s why I have to use regex.
I’m trying to compare all subjects where those subjects appear more than once and it will return me the occurrence.
The hard coded Regex is just to show check if I can match that subject in my stream of events.
Is there anyway where I can compare events where the subject appears more than once ?
Thank for the events. Give this a try. The rex here creates a new field called "new_subject".
source=* dest_port=25
| rex field=_raw "Subject\:\s(?<new_subject>.+)"
| eval count=mvcount(new_subject)
| stats sum(count) as totalOccurrence
Try this for your regex:
fw:\sorder\sinquiry
Wouldn't it be easier to just do a where modifier and by stats?
Try the below, this will create a new field called subject, count based on the subject name, and show only results with more than 2 events.
source=* dest_port=25
| rex field=src_content max_match=0 "(?PSubject: Fw: Order Inquiry)"
| stats count by Subject
| where count > 2
Hi @auraria1, thank you so much ! But how do I improve my query such that my rex isn't a hardcoded match ? for example I want to compare whether two events contains the same title in the src_content, then I return the result ?
I really really appreciate your help as some of my previous questions posted online wasn't answered.
Wait I think I misunderstood the original question, is the issue that the regex isn't matching properly?
Is that why you're having issues with the hardcoded regex?
Can you provide 2-3 example email subjects so I can take a look and see why it isn't working?
I also would like to point out that as you said, it will create a new field called subject. Although the number of occurence is correct, but why is it that when when I change the regex, it returned the regex results instead of the subject found in the src_content ?
for example: if I just put:
| rex field=src_content max_match=0 "(?PSubject: Fw: )"
in splunk stream src_content: "Subject: Fw Order Inquiry"
it will return me "Fw:" as the subject returned instead of the matched result in the src_content. Why is that so ?