Splunk Search

How to add an "eval" command to my search in order to apply "eventstats" variables?

jwalzerpitt
Influencer

I have the following search for my email in which I pull the number of events per Recipient Address by Sender Address (basically looking for how many emails each user sends):

index=email NOT Status=Quarantined NOT Status=Failed SenderAddress=*.xyz.com | stats count as ReceipientAddress by SenderAddress

I'm trying to accomplish two things things: 1) Pipe that out to eval for a new column called 'emailsent' so I can then apply eventstats variables (avg, stdev) as I'm looking to create a search that shows me users who are sending 2) the same Subject to multiple recipients

How do I add an eval command for 'emailsent' and how would I go about making sure the emails sent to multiple users have the same Subject?

Thx

0 Karma
1 Solution

DalJeanis
Legend

If someone normally sends 1 email 90 percent of the time, but occasionally sends 5 or 10, it would trip the threshold. So, let's throw out all the single emails before we calculate avg and SD. And let's round the avg and stdev up using the ceiling function.

Also, let's set an absolute lower bound as well. Say, the person has to send at least 10 emails before tripping the rule.

Try this -

 index=email NOT Status=Quarantined NOT Status=Failed SenderAddress=*.xyz.com 
 | stats count by SenderAddress Subject 
 | search count > 1
 | eventstats avg(count) as AvgOfCount, stdev(count) as StdevOfCount, max(count) as MaxOfCount by SenderAddress 
| eval AvgOfCount = ceiling(AvgOfCount)
| eval StdevOfCount=ceiling(StdevOfCount)
 | threshold = max(AvgOfCount + 4 * StdevOfCount,10)
 | eval KeepMe = if(count >= threshold, 1,0)
 | search KeepMe=1

Cripes, I reversed the test. Keep the ones that are GREATER than the threshold. Duh. Fixed.

View solution in original post

DalJeanis
Legend

If someone normally sends 1 email 90 percent of the time, but occasionally sends 5 or 10, it would trip the threshold. So, let's throw out all the single emails before we calculate avg and SD. And let's round the avg and stdev up using the ceiling function.

Also, let's set an absolute lower bound as well. Say, the person has to send at least 10 emails before tripping the rule.

Try this -

 index=email NOT Status=Quarantined NOT Status=Failed SenderAddress=*.xyz.com 
 | stats count by SenderAddress Subject 
 | search count > 1
 | eventstats avg(count) as AvgOfCount, stdev(count) as StdevOfCount, max(count) as MaxOfCount by SenderAddress 
| eval AvgOfCount = ceiling(AvgOfCount)
| eval StdevOfCount=ceiling(StdevOfCount)
 | threshold = max(AvgOfCount + 4 * StdevOfCount,10)
 | eval KeepMe = if(count >= threshold, 1,0)
 | search KeepMe=1

Cripes, I reversed the test. Keep the ones that are GREATER than the threshold. Duh. Fixed.

jwalzerpitt
Influencer

Thx for the new search. I ran that and I'm down to 102 events, which is a great reduction.

However, let me throw another wrench into this if you don't mind 🙂

Right now when I pick a time from the time picker, I'm limiting the average to that exact time, but what I wanted to get a rolling average. For example, say I have a user who if I ran a search with eventstats avg(emailsent) for all-time returns an average of say 3.2. Running the same search but for the last 24 hours, the average is now 6.82.

if possible, I'd like to look at the average for the user for a larger time period (say 3 or 6 months) to get a more accurate look at their average, which would help smooth out some spikes. Would that be possible

0 Karma

DalJeanis
Legend

Sure, anything is possible.

Tell you what, I'll promote the above comment (that worked) to an answer, you can accept it and we'll close this question.

Post a new thread with this as the starting search, asking how to get the average over a greater time. Basically you'll move all of that into a subsearch, then inner join only the current stats for the shorter term to the values returned from the subsearch.

0 Karma

jwalzerpitt
Influencer

Sounds good - thx

0 Karma

jwalzerpitt
Influencer

Posted question as, "How to create a search for moving average for email sent per user"

Thx

0 Karma

jwalzerpitt
Influencer

After mucking around a bit, I believe I came up with a decent search. Search is as follows:

| stats count as RecipientAddress by SenderAddress Subject
| eval emailsent=(RecipientAddress)
| eventstats avg(emailsent) as avg stdev(emailsent) as stdev by SenderAddress
| stats max(emailsent) as maxsent, values(avg) as avgpersender, values(stdev) as stdevpersender by SenderAddress
| eval threshold = 4 * ( stdevpersender + avgpersender ) | where maxsent > threshold

Basically, I'm trying to calculate the baseline of each sender and then isolate the outliers who are sending greater than four times their standard deviation to show me potential indicators of compromise.

0 Karma

DalJeanis
Legend

Try this -

index=email NOT Status=Quarantined NOT Status=Failed SenderAddress=*.xyz.com 
| stats count by SenderAddress Subject 
| eventstats avg(count) as AvgOfCount, stdev(count) as StdevOfCount, max(count) as MaxOfCount by SenderAddress 
| eval KeepMe = if(count > AvgOfCount + 4 * StdevOfCount, 1,0)
| search KeepMe = 1

That should return you the email events that exceed 4 SDs from the sender's usual mailing list size.


Cripes, I reversed the test. Keep the ones that are GREATER than the threshold. Duh. Fixed.

jwalzerpitt
Influencer

Thx for the search. When I run this against 'Last 24 hours' I get 14,477 events, whereas running the search below returns six:

| stats count as RecipientAddress by SenderAddress Subject
| eval emailsent=(RecipientAddress)
| eventstats avg(emailsent) as avg stdev(emailsent) as stdev by SenderAddress
| stats max(emailsent) as maxsent, values(avg) as avgpersender, values(stdev) as stdevpersender by SenderAddress,Subject
| eval threshold = 4 * ( stdevpersender + avgpersender ) | where maxsent > threshold | sort -threshold

0 Karma

aaraneta_splunk
Splunk Employee
Splunk Employee

Hi @jwalzerpitt - Did your answer provide a working solution to your question? If yes and you would like to close out your post, don't forget to click "Accept". But if you'd like to keep it open for possibilities of other answers/feedback, you don't have to take action on it yet. Oh and don't forget to upvote any answers or comments that were especially helpful. Thanks!

0 Karma

jwalzerpitt
Influencer

I'm going to keep this open for a few more days to see if anyone else can offer perhaps a more efficient way or perhaps provide feedback on the search I provided.

Thx

0 Karma

DalJeanis
Legend

This tests to make sure that the emails have the same subject and was sent in the same minute.

 index=email NOT Status=Quarantined NOT Status=Failed SenderAddress=*.xyz.com 
| eval TheMinute = relative_time(_time,"@m")
| stats count (RecipientAddress) as RecipientCount by SenderAddress Subject TheMinute
| where RecipientCount > 1

If you also want to check the overall count, you could use eventstats to add it to the summary records. This lets through records for all emails where the sender sent more than 1 of the same in the same minute, or more than 100 emails in the same day.

 index=email NOT Status=Quarantined NOT Status=Failed SenderAddress=*.xyz.com 
| eval TheMinute = relative_time(_time,"@m")
| eval TheDay = relative_time(_time,"@d")
| stats count (RecipientAddress) as RecipientCount by SenderAddress Subject TheMinute
| eventstats count as SenderCount by SenderAddress TheDay
| where RecipientCount > 1 or SenderCount > 100

jwalzerpitt
Influencer

Thx

Slightly modified the query as follows:

| eval TheMinute = relative_time(_time,"@m")
| eval TheDay = relative_time(_time,"@d")
| stats count as RecipientAddress by SenderAddress Subject TheMinute
| eventstats count as SenderCount by SenderAddress TheDay
| where RecipientAddress > 1 or SenderCount > 100

and I'm getting events returned.

Splunk gave me an error when the query had, "| stats count (RecipientAddress) as RecipientCount by SenderAddress Subject TheMinute", "Error in 'stats' command: The argument '(RecipientAddress)' is invalid."

I'd still like to pull some query variables from another search I have that will help me find the avg and stdev of emails sent per Sender to recipients and then look for stdev that are 4x the threshold. From my other query I have:

| eventstats avg(RecipientAddress) as avg stdev(RecipientAddress) as stdev by SenderAddress | stats max(RecipientAddress) as maxsent, values(avg) as avgpersender, values(stdev) as stdevpersender by SenderAddress, RecipientAddress | eval threshold = 4 * ( RecipientAddress + avgpersender ) | where maxsent > threshold

0 Karma

DalJeanis
Legend

Sounds like the field name is not right.

Try your query with | head 5 and look at the results.

What is the field name for the email address of the recipient?

Actually, can you post the list of interesting fields?

0 Karma

jwalzerpitt
Influencer

I get:

FromIP
messageId
Received
ReceipientAddress
SenderAddress
Status
Subject
ToIP

0 Karma

DalJeanis
Legend

ah, that's the problem - I spelled RecipientAddress wrong. Well, wrong as per your extract names. 😉

Try this -

  index=email NOT Status=Quarantined NOT Status=Failed SenderAddress=*.xyz.com 
 | eval TheMinute = relative_time(_time,"@m")
 | eval TheDay = relative_time(_time,"@d")
 | stats count (ReceipientAddress) as RecipientCount by SenderAddress Subject TheMinute
 | eventstats count as SenderCount by SenderAddress TheDay
 | where RecipientCount > 1 or SenderCount > 100

jwalzerpitt
Influencer

That's my bad on the misspelling!

Ran the following, but still no events were returned:

| eval TheMinute = relative_time(_time,"@m")
| eval TheDay = relative_time(_time,"@d")
| stats count as RecipientAddress by SenderAddress Subject TheMinute
| eventstats count as SenderCount by SenderAddress TheDay
| where RecipientCount > 1 or SenderCount > 100

Please note I had to change '| stats count (ReceipientAddress) as RecipientCount by SenderAddress Subject TheMinute' to '| stats count as RecipientAddress by SenderAddress Subject TheMinute' as I get the same error message as before, '"Error in 'stats' command: The argument '(RecipientAddress)' is invalid." (with correct spelling of RecipientAddress)

0 Karma

DalJeanis
Legend

Hmm. I was looking for what "interesting fields" were present in the original events, before any eval statements. Somewhere, there has to be a recipientaddress field of some kind.

OK, let's end this thread and start on your next one.

0 Karma

jwalzerpitt
Influencer

Thx for the reply.

I tried the | eval theMinute = relative_time(_time,"@m") | stats count as RecipientAddress by SenderAddress Subject theMinute | where count > 1

but I'm not seeing any results. Even modified the "@m" to various number, @5m, @1h, etc, but no events are returned.

Testing the query, I ran just the search and | stats count as RecipientAddress by SenderAddress Subject - and I see SenderAddresses with a count > 1 with the same Subject

0 Karma

rjthibod
Champion

I think I would need a little more information in order to answer your question fully with regards to using functions avg or stdev on your data.

To simply see how many recipients are receiving the same message, you can slightly modify your current query. You need to substitute your fieldname for the subject field

index=email NOT Status=Quarantined NOT Status=Failed SenderAddress=*.xyz.com 
| stats dc(ReceipientAddress) as count by SenderAddress  <SUBJECT_FIELD>
| fields SenderAddress <SUBJECT_FIELD> count
0 Karma

jwalzerpitt
Influencer

Thx for the reply.

Running search... | stats dc(ReceipientAddress) as count by SenderAddress,Subject | fields SenderAddress Subject count - brings back a count of 0 for each event

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...