I have the following search for my email in which I pull the number of events per Recipient Address by Sender Address (basically looking for how many emails each user sends):
index=email NOT Status=Quarantined NOT Status=Failed SenderAddress=*.xyz.com | stats count as ReceipientAddress by SenderAddress
I'm trying to accomplish two things things: 1) Pipe that out to eval for a new column called 'emailsent' so I can then apply eventstats variables (avg, stdev) as I'm looking to create a search that shows me users who are sending 2) the same Subject to multiple recipients
How do I add an eval command for 'emailsent' and how would I go about making sure the emails sent to multiple users have the same Subject?
Thx
If someone normally sends 1 email 90 percent of the time, but occasionally sends 5 or 10, it would trip the threshold. So, let's throw out all the single emails before we calculate avg and SD. And let's round the avg and stdev up using the ceiling function.
Also, let's set an absolute lower bound as well. Say, the person has to send at least 10 emails before tripping the rule.
Try this -
index=email NOT Status=Quarantined NOT Status=Failed SenderAddress=*.xyz.com
| stats count by SenderAddress Subject
| search count > 1
| eventstats avg(count) as AvgOfCount, stdev(count) as StdevOfCount, max(count) as MaxOfCount by SenderAddress
| eval AvgOfCount = ceiling(AvgOfCount)
| eval StdevOfCount=ceiling(StdevOfCount)
| threshold = max(AvgOfCount + 4 * StdevOfCount,10)
| eval KeepMe = if(count >= threshold, 1,0)
| search KeepMe=1
Cripes, I reversed the test. Keep the ones that are GREATER than the threshold. Duh. Fixed.
If someone normally sends 1 email 90 percent of the time, but occasionally sends 5 or 10, it would trip the threshold. So, let's throw out all the single emails before we calculate avg and SD. And let's round the avg and stdev up using the ceiling function.
Also, let's set an absolute lower bound as well. Say, the person has to send at least 10 emails before tripping the rule.
Try this -
index=email NOT Status=Quarantined NOT Status=Failed SenderAddress=*.xyz.com
| stats count by SenderAddress Subject
| search count > 1
| eventstats avg(count) as AvgOfCount, stdev(count) as StdevOfCount, max(count) as MaxOfCount by SenderAddress
| eval AvgOfCount = ceiling(AvgOfCount)
| eval StdevOfCount=ceiling(StdevOfCount)
| threshold = max(AvgOfCount + 4 * StdevOfCount,10)
| eval KeepMe = if(count >= threshold, 1,0)
| search KeepMe=1
Cripes, I reversed the test. Keep the ones that are GREATER than the threshold. Duh. Fixed.
Thx for the new search. I ran that and I'm down to 102 events, which is a great reduction.
However, let me throw another wrench into this if you don't mind 🙂
Right now when I pick a time from the time picker, I'm limiting the average to that exact time, but what I wanted to get a rolling average. For example, say I have a user who if I ran a search with eventstats avg(emailsent) for all-time returns an average of say 3.2. Running the same search but for the last 24 hours, the average is now 6.82.
if possible, I'd like to look at the average for the user for a larger time period (say 3 or 6 months) to get a more accurate look at their average, which would help smooth out some spikes. Would that be possible
Sure, anything is possible.
Tell you what, I'll promote the above comment (that worked) to an answer, you can accept it and we'll close this question.
Post a new thread with this as the starting search, asking how to get the average over a greater time. Basically you'll move all of that into a subsearch, then inner join only the current stats for the shorter term to the values returned from the subsearch.
Sounds good - thx
Posted question as, "How to create a search for moving average for email sent per user"
Thx
After mucking around a bit, I believe I came up with a decent search. Search is as follows:
| stats count as RecipientAddress by SenderAddress Subject
| eval emailsent=(RecipientAddress)
| eventstats avg(emailsent) as avg stdev(emailsent) as stdev by SenderAddress
| stats max(emailsent) as maxsent, values(avg) as avgpersender, values(stdev) as stdevpersender by SenderAddress
| eval threshold = 4 * ( stdevpersender + avgpersender ) | where maxsent > threshold
Basically, I'm trying to calculate the baseline of each sender and then isolate the outliers who are sending greater than four times their standard deviation to show me potential indicators of compromise.
Try this -
index=email NOT Status=Quarantined NOT Status=Failed SenderAddress=*.xyz.com
| stats count by SenderAddress Subject
| eventstats avg(count) as AvgOfCount, stdev(count) as StdevOfCount, max(count) as MaxOfCount by SenderAddress
| eval KeepMe = if(count > AvgOfCount + 4 * StdevOfCount, 1,0)
| search KeepMe = 1
That should return you the email events that exceed 4 SDs from the sender's usual mailing list size.
Cripes, I reversed the test. Keep the ones that are GREATER than the threshold. Duh. Fixed.
Thx for the search. When I run this against 'Last 24 hours' I get 14,477 events, whereas running the search below returns six:
| stats count as RecipientAddress by SenderAddress Subject
| eval emailsent=(RecipientAddress)
| eventstats avg(emailsent) as avg stdev(emailsent) as stdev by SenderAddress
| stats max(emailsent) as maxsent, values(avg) as avgpersender, values(stdev) as stdevpersender by SenderAddress,Subject
| eval threshold = 4 * ( stdevpersender + avgpersender ) | where maxsent > threshold | sort -threshold
Hi @jwalzerpitt - Did your answer provide a working solution to your question? If yes and you would like to close out your post, don't forget to click "Accept". But if you'd like to keep it open for possibilities of other answers/feedback, you don't have to take action on it yet. Oh and don't forget to upvote any answers or comments that were especially helpful. Thanks!
I'm going to keep this open for a few more days to see if anyone else can offer perhaps a more efficient way or perhaps provide feedback on the search I provided.
Thx
This tests to make sure that the emails have the same subject and was sent in the same minute.
index=email NOT Status=Quarantined NOT Status=Failed SenderAddress=*.xyz.com
| eval TheMinute = relative_time(_time,"@m")
| stats count (RecipientAddress) as RecipientCount by SenderAddress Subject TheMinute
| where RecipientCount > 1
If you also want to check the overall count, you could use eventstats to add it to the summary records. This lets through records for all emails where the sender sent more than 1 of the same in the same minute, or more than 100 emails in the same day.
index=email NOT Status=Quarantined NOT Status=Failed SenderAddress=*.xyz.com
| eval TheMinute = relative_time(_time,"@m")
| eval TheDay = relative_time(_time,"@d")
| stats count (RecipientAddress) as RecipientCount by SenderAddress Subject TheMinute
| eventstats count as SenderCount by SenderAddress TheDay
| where RecipientCount > 1 or SenderCount > 100
Thx
Slightly modified the query as follows:
| eval TheMinute = relative_time(_time,"@m")
| eval TheDay = relative_time(_time,"@d")
| stats count as RecipientAddress by SenderAddress Subject TheMinute
| eventstats count as SenderCount by SenderAddress TheDay
| where RecipientAddress > 1 or SenderCount > 100
and I'm getting events returned.
Splunk gave me an error when the query had, "| stats count (RecipientAddress) as RecipientCount by SenderAddress Subject TheMinute", "Error in 'stats' command: The argument '(RecipientAddress)' is invalid."
I'd still like to pull some query variables from another search I have that will help me find the avg and stdev of emails sent per Sender to recipients and then look for stdev that are 4x the threshold. From my other query I have:
| eventstats avg(RecipientAddress) as avg stdev(RecipientAddress) as stdev by SenderAddress | stats max(RecipientAddress) as maxsent, values(avg) as avgpersender, values(stdev) as stdevpersender by SenderAddress, RecipientAddress | eval threshold = 4 * ( RecipientAddress + avgpersender ) | where maxsent > threshold
Sounds like the field name is not right.
Try your query with | head 5 and look at the results.
What is the field name for the email address of the recipient?
Actually, can you post the list of interesting fields?
I get:
FromIP
messageId
Received
ReceipientAddress
SenderAddress
Status
Subject
ToIP
ah, that's the problem - I spelled RecipientAddress wrong. Well, wrong as per your extract names. 😉
Try this -
index=email NOT Status=Quarantined NOT Status=Failed SenderAddress=*.xyz.com
| eval TheMinute = relative_time(_time,"@m")
| eval TheDay = relative_time(_time,"@d")
| stats count (ReceipientAddress) as RecipientCount by SenderAddress Subject TheMinute
| eventstats count as SenderCount by SenderAddress TheDay
| where RecipientCount > 1 or SenderCount > 100
That's my bad on the misspelling!
Ran the following, but still no events were returned:
| eval TheMinute = relative_time(_time,"@m")
| eval TheDay = relative_time(_time,"@d")
| stats count as RecipientAddress by SenderAddress Subject TheMinute
| eventstats count as SenderCount by SenderAddress TheDay
| where RecipientCount > 1 or SenderCount > 100
Please note I had to change '| stats count (ReceipientAddress) as RecipientCount by SenderAddress Subject TheMinute' to '| stats count as RecipientAddress by SenderAddress Subject TheMinute' as I get the same error message as before, '"Error in 'stats' command: The argument '(RecipientAddress)' is invalid." (with correct spelling of RecipientAddress)
Hmm. I was looking for what "interesting fields" were present in the original events, before any eval statements. Somewhere, there has to be a recipientaddress field of some kind.
OK, let's end this thread and start on your next one.
Thx for the reply.
I tried the | eval theMinute = relative_time(_time,"@m") | stats count as RecipientAddress by SenderAddress Subject theMinute | where count > 1
but I'm not seeing any results. Even modified the "@m" to various number, @5m, @1h, etc, but no events are returned.
Testing the query, I ran just the search and | stats count as RecipientAddress by SenderAddress Subject - and I see SenderAddresses with a count > 1 with the same Subject
I think I would need a little more information in order to answer your question fully with regards to using functions avg
or stdev
on your data.
To simply see how many recipients are receiving the same message, you can slightly modify your current query. You need to substitute your fieldname for the subject field
index=email NOT Status=Quarantined NOT Status=Failed SenderAddress=*.xyz.com
| stats dc(ReceipientAddress) as count by SenderAddress <SUBJECT_FIELD>
| fields SenderAddress <SUBJECT_FIELD> count
Thx for the reply.
Running search... | stats dc(ReceipientAddress) as count by SenderAddress,Subject | fields SenderAddress Subject count - brings back a count of 0 for each event