Splunk Search

How to edit my search to find total emails sent per user on a daily basis?

jwalzerpitt
Influencer

I am trying to pull stats that shows the average emails sent per user per day and I have the following search below, but running the search is not returning any results.:

index=exchange NOT Status=Quarantined NOT Status=Failed SenderAddress=*.xyz.com 
| dedup MessageId sortby _time
| stats count(eval(RecipientAddress)) as recipient_count by SenderAddress
| table _time SenderAddress recipient_count Subject MessageId Size 
| timechart sum(recipient_count) as "Daily Total" span=1d 

Thx

0 Karma
1 Solution

masonmorales
Influencer

Try this:

 index=exchange NOT Status=Quarantined NOT Status=Failed SenderAddress=*.xyz.com 
 | dedup MessageId sortby _time
 | bin _time span=1d
 | stats count(RecipientAddress) as recipient_count by SenderAddress _time
 | timechart avg(recipient_count) as "Daily Average" 

If that doesn't work, could you post some anonymized sample data?

View solution in original post

adayton20
Contributor

In your search, you’re attempting to table the _time field when _time is not included as a field after your stats command, and also trying to create a timechart without the _time field as an included field after your stats command. You will not be able to retain that table after piping to a timechart. Not sure if you wanted a table or a timechart? Since your final pipe is to a timechart, I went with that first.

If you want the total emails sent per user per day with a timechart, try this:

index=exchange NOT Status=Quarantined NOT Status=Failed SenderAddress=*.xyz.com 
| timechart span=1d dc(RecipientAddress) AS recipient_count by SenderAddress

This should provide you with a timechart of the distinct count of recipients by the sender with a span of 1 day over whatever time period you choose.

I noticed you are searching on exchange, and also specifying a wildcard on the SenderAddress field with the domain, which I interpreted as you wanting to look at a sender(s) email activity from a domain. With an index like exchange, depending on the size of your organization, this timechart might be difficult to read if you don’t specify a sender or a domain. The results in the timechart will truncate if you have more than 1000 senders in the timeframe.

Now, if you didn’t want a timechart and instead want a table with the information from the sender, try something like this:

index=exchange NOT Status=Quarantined NOT Status=Failed SenderAddress=*.xyz.com
| stats count(RecipientAddress) AS recipient_total, values(Subject) AS Subject, values(MessageId) AS MessageId, values(Size) AS Size  by SenderAddress
| table SenderAddress, recipient_total, Subject, MessageId, Size

Counting the values of time for the search above can get messy since it will just group all the times together, but you could try using a function like first(_time) or last(_time) in front of the values if you want to know when the oldest or newest event occurred. For example,

index=exchange NOT Status=Quarantined NOT Status=Failed SenderAddress=*.xyz.com
| stats count(RecipientAddress) AS recipient_total, first(_time) AS _time, values(Subject) AS Subject, values(MessageId) AS MessageId, values(Size) AS Size  by SenderAddress
| table _time, SenderAddress, recipient_total, Subject, MessageId, Size

This would give you the most recent event’s time out of all the events.

adayton20
Contributor

You're very welcome!

Hmm, I don't quite understand what you mean by not having the field. The recipient_count field is not a field from the exchange logs. It is a field created by you and I at search time by renaming the field after the transforming command "stats", for example: dc(RecipientAddress) AS recipient_count by SenderAddress. Whenever you place a field inside of a function, like dc(), or count() from your example, you can choose whether or not to rename the field using the "as" clause. I'm not sure why the stats eval condition is needed in this context? Is this a search you're trying to combine or change from a previous admin? It may help to provide us with a sanitized sample of the data and explain what is you're trying to do.

In the second search you mentioned expanding the search returns no results. The search is not working because it isn't composed correctly. The first part of the search is creating a statistical summary (stats count) following dedup (deleting duplicates) of the MessageId field. The arguments for the stats command is returning the count of MessageId as the field recipient_count. This search by itself will only return a single field: recipient_count, which for me, appears to return the resulting count of events with MessageId in them, renamed as the field "recipient_count". From here, you are trying to create an ordered table of the previous statistical summary value (the value of recipient_count) with other fields not included in the previous stats command, which is not possible to do. The timechart will also not work should you remove the table because timecharts are created with the timestamps of the events.

jwalzerpitt
Influencer

When you state, "The arguments for the stats command is returning the count of MessageId as the field recipient_count. This search by itself will only return a single field: recipient_count, which for me, appears to return the resulting count of events with MessageId in them, renamed as the field "recipient_count", what I'm trying to do is get a count of the number of recipients for a user. Being that I don't have a recipient_count field, I thought I could get a count of the number of recipients for a user by substituting the MessageId field (figuring if I send out an email to 10 people the MessageId would be listed 10 times).

Thx

0 Karma

jwalzerpitt
Influencer

Thx

When I check Exchange message tracking log fields (https://technet.microsoft.com/en-us/library/cc539064.aspx), recipient-count is listed as a field. Unfortunately, pulling the logs from the O365 service, this is a field that is not offered.

I'm trying to adapt and apply the search shown in the .conf2016 presentation, "Detecting the Adversary Post-Compromise with Threat Models and Behavioral Analytics" (around the 31 min mark) to look for anomalies. Search is as follows:

sourcetype="MSExchange"2010:MessageTracking" sender="xxxx@xxxx.com" recipient_count!=NONE 
| dedup message_id sortby _time 
| table _time directionality sender recipient message_subject message_id recipient_count total_bytes
| timecahrt sum(recipient_count) as daily_total span=1d
| eventstats median(daily_total) AS median, p25(daily_total) as p25, p75(daily_total) as p75, mean(daily_total) as mean
| eval iqr = p75 - p25 
| eval xplier = 2 
| eval low_lim = median - (iqr * xplier) 
| eval high_lim = median + (iqr * xplier) 
| eval anomaly = if(daily_total<low_lim OR daily_total > high_lim, daily_total,0)
| table _time daily_total anomaly

Thx again

0 Karma

jwalzerpitt
Influencer

Thx a lot for the various searches as they (and the breakdown of them) greatly help.

We're pulling O365 Exchange message trace logs which don't contain the recipient count field. However, we do have the MessageId field. I'm using the following search for total recipient count:

index=exchange NOT Status=Quarantined NOT Status=Failed NOT [|inputlookup Senderwhitelist.csv] SenderAddress=*.xyz.com 
| dedup MessageId sortby _time 
| stats count(eval(MessageId)) as recipient_count

However, when I expand the search to the following search below, no events are returned.

index=exchange NOT Status=Quarantined NOT Status=Failed NOT [|inputlookup Senderwhitelist.csv] SenderAddress=*.xyz.com 
| dedup MessageId sortby _time 
| stats count(eval(MessageId)) as recipient_count
| table _time SenderAddress RecipientAddress Subject recipient_count Size
| timechart sum(recipient_count) as daily_total span=1d
| eventstats median(daily_total) AS median, p25(daily_total) as p25, p75(daily_total) as p75, mean(daily_total) as mean
| eval iqr = p75 - p25 
| eval xplier = 2 
| eval low_lim = median - (iqr * xplier) 
| eval high_lim = median + (iqr * xplier) 
| eval anomaly = if(daily_total<low_lim OR daily_total > high_lim, daily_total,0)
| table _time daily_total anomaly
0 Karma

DalJeanis
Legend

The |stats count() as recipient_count command leaves in existence exactly one record with exactly one field - recipient_count. The following table command therefore has nothing to work against.

Change these two lines

 | dedup MessageId sortby _time 
 | stats count(eval(MessageId)) as recipient_count

to this, in this order

 | eventstats count as recipient_count by MessageId
 | dedup MessageId sortby _time 

and the results should pass through to the table.

Next, your timechart command should probably have a "by SenderAddress" clause.

After that, you need to start thinking in terms of what you have, and where you are going.

You will have, at that point, one record per day for each SenderAddress that sent an email that day, summing all the emails sent by that SenderAddress.

Your eventstats obviously needs to be by SenderAddress as well, and it needs to be included in the final table command with a sort clause afterward.

masonmorales
Influencer

Try this:

 index=exchange NOT Status=Quarantined NOT Status=Failed SenderAddress=*.xyz.com 
 | dedup MessageId sortby _time
 | bin _time span=1d
 | stats count(RecipientAddress) as recipient_count by SenderAddress _time
 | timechart avg(recipient_count) as "Daily Average" 

If that doesn't work, could you post some anonymized sample data?

somesoni2
Revered Legend

Try like this

index=exchange NOT Status=Quarantined NOT Status=Failed SenderAddress=*.xyz.com 
 | dedup MessageId sortby _time
 | timechart count(RecipientAddress) as "Daily Total" span=1d 

OR

index=exchange NOT Status=Quarantined NOT Status=Failed SenderAddress=*.xyz.com 
 | dedup MessageId sortby _time
 | timechart span=1d  count(RecipientAddress) as "Daily Total" by SenderAddress

OR

index=exchange NOT Status=Quarantined NOT Status=Failed SenderAddress=*.xyz.com 
 | dedup MessageId sortby _time | bucket span=1d _time
| stats count(eval(RecipientAddress)) as recipient_count by _time SenderAddress
0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...