I am working with an event log from an email system where all the different recipients of an email are being listed and I have taken that field from a single value field to a multivalue field. What I am trying to do is match the recipient's domain to a list of approved domain items and only those domains. I am having difficulty because if I specify something like the following
index=email | makemv delim="," recipient | recipient=*domain1.com OR *domain2.com
I end up getting emails that could contain anything that has either of those in their, including emails that contain domain1.com, domain2.com, and domain3.com in them (domain3.com being unacceptable to have included in this). I want to restrict the search for only those events that have addresses from the list of domains that are approved for sending information and any emails that contain those domains and other domains that are not on that list excluded; going back to my example here is simple illustration of what I am looking for:
included - recipient=user@domain1.com,user@domain2.com
included - recipient=user@domain1.com
included - recipient=user@domain2.com
excluded - recipient=user@domain1.com,user@domain3.com
excluded - recipient=user@domain3.com
excluded - recipient=user@domain2.com,user@domain3.com
any ideas on how to achieve this would be appreciated.
I am not sure how performant that is...
| makeresults | eval recipient="user@domain1.com,user@domain3.com-user@domain1.com-user@domain1.com,user@domain3.com-user@domain2.com" | makemv delim="-" recipient | mvexpand recipient | makemv delim="," recipient
| rex field=recipient "[^@]+@(?<domain>.+)" | eval match=mvfilter(match(domain,"domain1.com") OR match(domain, "domain2.com")) | eval result=if(mvcount(domain)=mvcount(match),"included","excluded")
First part in the above is to generate me some input. The logic is in the second part: I build a list of matches between you input and your fixed domains, and then I count if the number of matched domains matches your input. I create a result field so you can see everything in a table, but you could of yourse have a where clause or search appended to it.
That seems to work, but I am sure there are other approaches as well.
I am not sure how performant that is...
| makeresults | eval recipient="user@domain1.com,user@domain3.com-user@domain1.com-user@domain1.com,user@domain3.com-user@domain2.com" | makemv delim="-" recipient | mvexpand recipient | makemv delim="," recipient
| rex field=recipient "[^@]+@(?<domain>.+)" | eval match=mvfilter(match(domain,"domain1.com") OR match(domain, "domain2.com")) | eval result=if(mvcount(domain)=mvcount(match),"included","excluded")
First part in the above is to generate me some input. The logic is in the second part: I build a list of matches between you input and your fixed domains, and then I count if the number of matched domains matches your input. I create a result field so you can see everything in a table, but you could of yourse have a where clause or search appended to it.
That seems to work, but I am sure there are other approaches as well.
I think one thing to note would be I am hoping to also do this against a table of domain values. I have a list of approved domains that I am looking to compare against but there are hundreds of them that we have approved and having them in a single search might be a bit difficult. Based on what you provided I am seeing that this is something similar to this programmatic way of doing this:
foreach(emailevent in searchresults){
foreach(recipient in emailevent){
if(recipient != listdomain){
discardemail = true
}
}
}
That's essentially what I a looking to accomplish with the Splunk search so that the emails with only approved domains can be returned for evaluation.
Now I think I nailed it.
I uploaded a .csv with this content:
allowed
domain1.com
domain2.com
domain4.com
and created a lookup called "allowed" with that (I should have better chosen names for lookup and field, I was confused by that later myself).
This is used in a query like this:
| makeresults | eval recipient="user@domain1.com,user@domain2.com-user@domain1.com-user@domain1.com,user@domain3.com-user@domain2.com-user1@domain2.com,user2@domain2.com-bla@domain4.com-bla@domain4.com,user@domain3.com-user@domain4.com,user@domain1.com" | makemv delim="-" recipient | mvexpand recipient | eval orig_to=recipient | makemv delim="," recipient
| rex field=recipient "^[^@]+@(?<domains>.+)$" | eval domains=mvdedup(domains) | lookup allowed allowed as domains OUTPUT allowed AS matched | eval result=if(mvcount(domains)=mvcount(matched),"included","excluded") | fields - domains matched
And the result looks fine to me, I hope I got all cases covered:
I went through some terrible iterations, but I hope this one is correct. 🙂
@knielsen - That code looks okay, but you can easily make it more performant. Since you don't need the domain separate, you can bypass the rex and code the mvfilter to go directly against recipient, into a new field. Add "@" at the beginning and an end-anchor "$" at the end of the mask to make the match more efficient.