Splunk Search

Is there a count of emails then to a compare?

Loves-to-Learn Lots

Okay, so this is quite theorectical.... the nature of this search is to basically count the Incoming Domains when there is greater than 200 unique emails.  Then, I need to count the outgoing Domains when they MATCH those domains and do a count and compare a % of when the conversation is let's say >30%-45%...

What are we doing with this?

Answer: We are going to count the domains that send IN email with which we clearly RESPOND to, and without getting into the mixture of RE:'s and FWD's... Out of Office Replies... these will and should NOT make up 45% of the conversation... plus we have to be careful NOT to consider any emails with which are NEW conversations... 1-2 emails in, compared to 1 reply back is not 50% in this search... we have to look at a wide bearth of time to determine the real senders, but get rid of the apple.card, itunes.com, apple.com and things that "SEND" large quantities of emails, but do not have a formal conversation to them.

Do you have a search to play with on this?

Answer: Kind of... which is why I'm posting here hoping for a better logic



index=email "filter.routeDirection"=inbound 
| rex field=envelope.from "\@(?<domainIN>[^ ]*)"
| stats dc(envelope.from) as num_count by domainIN
```Possible combination```
| join type=inner domainIN
    [search index=email sourcetype=pps_messagelog "filter.routeDirection"=outbound
| rex field=msg.header.to "\@(?<domainIN2>.*)"\>]
| stats dc(msg.header.to) as num_count2 by domainIN2
| where count >200



Data is Proofpoint using the sourcetype=pps_messagelog  Above is the capture regex of the domains seen where the routeDirection=inbound.  Now i need to compare the same domains seen in domainIN as the opposite direction... I could see where a "keep a count of large senders" lookup could be good, but the end goal of this is to simply make a list of the domains we KNOW we talk to... this will then get consumed in other security stacks as a method to determine "THIS IS A FRIEND" basically.  You could as a Use-Case send that list to a Threat Intel platform for "domain watcher" status to determine the "look-a-like" domains that could pop up, or if they got compromised you could KNOW it's a threat to you as well... 

I hope that makes sense.  If not, I can answer questions, and hopefully my brain can help erode the terrible code you see above... cause it's not working..!!

Labels (4)
Tags (1)
0 Karma

Loves-to-Learn Lots

Thanks for the reply ITWhisperer,

What about this kind of search... I have redrawn the logic...




index=email sourcetype=pps_messagelog src_user!="" recipient!="" recipient!=*ppops.net* ("filter.routeDirection"=inbound OR "filter.routeDirection"=outbound)
| rename envelope.rcpts{} AS TO, src_user AS FROM
| eval TO=lower(TO)
| eval FROM=lower(FROM)
| mvexpand FROM
| rex field=FROM \@(?<FROM>\w+.\w{3})
| mvexpand TO
| rex field=TO "@(?<TO>.*)"
| stats count by FROM, TO
| eval in_both=if(FROM==1 AND TO==1,1,0)
| addcoltotals
| table FROM TO in_both
| where in_both >1


The goal in this is to do a bit of a compare.... I attempted with a | chart dc(TO) by FROM or any level... and when doing the table, only one will show... it's become frustrating... it's like I'm right there.

0 Karma


Does the envelope.rcpts{} field only exist in events where filter.routeDirection field equals outbound and the src_user field only exist in events where the filter.routeDirection field equals inbound?

In both these cases, can they have multiple values in a single event?

0 Karma

Loves-to-Learn Lots

The answer is... It's in both directions. This would be alot easier if the data was inbound.recipient or something like that. You could build this I suppose...

And both recipient values can also have multiple recipients.. This .. as you typically know, are one to many on any outbound email.. as one person clicks send to many...but yes, an envelope.from or msg.header.from can be typically one, where the envelope.rcpts{} or msg.header.to can and will contain multiple values. Which are a mixture of "Display Name" <displayname@domain.com> format and some just an email address. 

It would seem to me we need separate searches to first count inbound then a join or subsearch that gets the same fields when direction is defined as outbound. Then limit the inbound to 200 unique emails, then find when the outbound count to the same domain is greater than 35%

Does this help?



0 Karma


This would be easier with some (anonymised) examples of the events you are working with.

Having said that, try replacing

| rename envelope.rcpts{} AS TO, src_user AS FROM
| eval TO=lower(TO)
| eval FROM=lower(FROM)


| eval TO=if('filter.routeDirection'="outbound", lower('envelope.rcpts{}'), null())
| eval FROM=if('filter.routeDirection'="inbound", lower(src_user), null())

This will create the TO and FROM fields only for events going in the relevant directions.

If TO is now a multi-value field (in the splunk sense), you can then use mvexpand to duplicate the events for each value in the TO field.

If TO is not a multi-value field, but instead it is a string containing multiple addresses, you can use rex to extract the domains from this field.

| rex field=TO max_match=0 "\@(?<domain>[\w\.]*)"

Extract the domain from the FROM field

| rex field=FROM "\@(?<domain>[\w\.]*)"

Then count the inbound and outbound events by domain and direction

| chart count by domain 'filter.routeDirection'

Then you are probably looking for domains where both counts are non-zero

| where inbound > 0 AND outbound > 0
0 Karma

Loves-to-Learn Lots

I'm going to do what I can to manipulate some data and clean it up... sadly, I wish there were easier ways of doing this..

As stated previously the items envelope.rcpts{} is present in both the in and the outbound filters... I feel that possibly locking them as TO is only the outbound is going to be a bit bad on that scenario.  I ran the code and it wasn't able to produce any "domain" field from the REX given...  I like the chart portion, and feel like much can be gotten from this... let me poke at this and getting a same... i did 15 mins and it made 699 events that are just... way too much to try and obfuscate the PII...

0 Karma


I am not sure I understand what you are saying about envelope.rcpts{} and the TO field.

My understanding is that you are only interested in envelope.rcpts{} if the direction in the event is outbound, i.e. which domains are you sending emails to?

Similarly, you are only interested in src_user if the direction in the event is inbound, i.e. which domains you have received emails from?

Is this right?

0 Karma

Loves-to-Learn Lots

Incorrect, the objective is

VendorDomain1/2/3 emails in.. they will be filter.routeDirection=inbound with their domain as src_user

DomainA emails out... to VendorDomain 1/2/3 they will be filter.routeDirection=outbound with their domain as src_user too...

This is because regardless of direction, there are fields mapped as recipient/src_user per the CIM for DataModel EMAIL... which is really really well mapped in this case... so using even the the base EMAIL Data Model, it will map.. Proofpoint also give you a Data Model called "Proofpoint On Demand Email Security"  So know we can also borrow from this concept as well.

The end goal here @ITWhisperer  is that we are looking at the emails coming in, and if we reply to them.  Sadly the values of src_user, and recipient are the same no matter the direction.  However, when I ran the below for 90 days... i only got 3 domains back which told me who of OUR domains emailed out of our Email Gateway ... i think my append is trash....

Here's my current search using part of your logic...

index=email sourcetype=pps_messagelog src_user!="" recipient!="" filter.routeDirection!=Internal recipient!=*ppops.net* (final_rule=pass OR final_rule=outbound_virus_clean)
| transaction msg.header.message-id
| rename envelope.rcpts{} as recipient, filter.routeDirection AS direction
| eval recipient=lower(recipient)
| mvexpand src_user
| mvexpand recipient
| makemv delim="([^,]+),?" allowempty=false recipient
| makemv delim="([^,]+),?" allowempty=false src_user
| rex field=recipient (.*@.*\.(?<recipient>.*\..*)$)
| rex field=recipient .*@(?<recipient2>.*\..*)$
| rex field=src_user (.*@.*\.(?<src_user>.*\..*)$)
| rex field=src_user .*@(?<src_user2>.*\..*)$
| eval domain=mvappend(src_user2,recipient2)
| chart count by domain direction
| where inbound > 200 AND outbound > 90

index=email with sourcetype don't give blank src_user or recipient.  block the visibility of internal or when recipient is ending in ppops.net for bounce backs/digests then here I ask for all final_rule=pass or outbound_virus_clean which indicates to the gateway that it didn't go to any quarantine in/out... those are the rules on a good email both in and out.
renames of two fields... then we start to do our break out... and just for good measure, I expand both... then rex them using the various rex's then I append values of the src_user2 and recipient2 as they are the best and have ONLY the domains...  then using your chart count, then I did a thing where the people emailing us is 200 and our replies back in 90 days was 30% of that... or at least 90 emails back...

I will eventually output this to a list and maybe run another check on it for clarity, afterwards dump this as a lookup list that can be fed various places (Threat intel, correlation rules, playbooks, etc) to then use as source of truth so to speak on "This is a vendor"...

For those interested, and I know this is long winded... but the Data Model for Proofpoint is quite good.. 
source (proofpoint_message_log)
action_dkimv.rule (output of the rule that impacted the result of the sending email dkim rule)
action_dmarc.rule (output of the rule that impacted the result of the sending email dmarc rule)
action_spf.rule (output of the rule that impacted the result of the sending email spf rule)
connection.tls.inbound.version (what version was used to TLS on all inbound emails)
filter.actions.action (all actions taken on the mail)
filter.actions.isFinal (what was the final action)
filter.actions.module (what modules were used to give the actions/rule)
filter.actions.rule (what rules were applied to the mail)
filter.disposition (what was the ruling of the email based on decision tree)
filter.quarantine.folder (if it went to quarantine, what folder was it)
filter.routeDirection (what direction was it?  Internal, Inbound, or Outbound)
final_action (what was the final action PERIOD)
final_module (what was the last module that it went through)
final_rule (was was the final rule PERIOD) HINT: it maps directly to the RULES of each module
is_encrypted (was the email encrypted TRUE/FALSE)
msg.header.message-id (the message ID)
msg.header.subject (Better to always look at as logic in parser takes care of normalization of html subjects)
msg.header.to (who was the email to IS MULTIVALUED)
msg_header_from (who sent the email)

0 Karma

Loves-to-Learn Lots

Ok, you may have noticed I had a transaction off message-id... removing this has now seen the data that I need....

Now I have the | where inbound > 1000 AND outbound > 300

it's still going, but is showing me the relative nature of what I'm looking for...

Will keep you posted!

0 Karma


The search you are joining with doesn't appear to return domainIN (your join field) it is returning domainIN2

Your final stats does return a count field so your where command will not find any events in the pipeline.

0 Karma
Get Updates on the Splunk Community!

Registration for Splunk University is Now Open!

Are you ready for an adventure in learning?   Brace yourselves because Splunk University is back, and it's ...

Splunkbase | Splunk Dashboard Examples App for SimpleXML End of Life

The Splunk Dashboard Examples App for SimpleXML will reach end of support on Dec 19, 2024, after which no new ...

Understanding Generative AI Techniques and Their Application in Cybersecurity

Watch On-Demand Artificial intelligence is the talk of the town nowadays, with industries of all kinds ...