Splunk Search

Find values that only match a specific list of values and nothing else

JeffBothel
Explorer

I am working with an event log from an email system where all the different recipients of an email are being listed and I have taken that field from a single value field to a multivalue field. What I am trying to do is match the recipient's domain to a list of approved domain items and only those domains. I am having difficulty because if I specify something like the following

index=email | makemv delim="," recipient | recipient=*domain1.com OR *domain2.com

I end up getting emails that could contain anything that has either of those in their, including emails that contain domain1.com, domain2.com, and domain3.com in them (domain3.com being unacceptable to have included in this). I want to restrict the search for only those events that have addresses from the list of domains that are approved for sending information and any emails that contain those domains and other domains that are not on that list excluded; going back to my example here is simple illustration of what I am looking for:

included - recipient=user@domain1.com,user@domain2.com 
included - recipient=user@domain1.com
included - recipient=user@domain2.com
excluded - recipient=user@domain1.com,user@domain3.com
excluded - recipient=user@domain3.com
excluded - recipient=user@domain2.com,user@domain3.com

any ideas on how to achieve this would be appreciated.

Tags (1)
0 Karma
1 Solution

knielsen
Contributor

I am not sure how performant that is...

| makeresults | eval recipient="user@domain1.com,user@domain3.com-user@domain1.com-user@domain1.com,user@domain3.com-user@domain2.com" | makemv delim="-" recipient | mvexpand recipient | makemv delim="," recipient 

| rex field=recipient "[^@]+@(?<domain>.+)" | eval match=mvfilter(match(domain,"domain1.com") OR match(domain, "domain2.com")) | eval result=if(mvcount(domain)=mvcount(match),"included","excluded")

First part in the above is to generate me some input. The logic is in the second part: I build a list of matches between you input and your fixed domains, and then I count if the number of matched domains matches your input. I create a result field so you can see everything in a table, but you could of yourse have a where clause or search appended to it.

That seems to work, but I am sure there are other approaches as well.

View solution in original post

0 Karma

knielsen
Contributor

I am not sure how performant that is...

| makeresults | eval recipient="user@domain1.com,user@domain3.com-user@domain1.com-user@domain1.com,user@domain3.com-user@domain2.com" | makemv delim="-" recipient | mvexpand recipient | makemv delim="," recipient 

| rex field=recipient "[^@]+@(?<domain>.+)" | eval match=mvfilter(match(domain,"domain1.com") OR match(domain, "domain2.com")) | eval result=if(mvcount(domain)=mvcount(match),"included","excluded")

First part in the above is to generate me some input. The logic is in the second part: I build a list of matches between you input and your fixed domains, and then I count if the number of matched domains matches your input. I create a result field so you can see everything in a table, but you could of yourse have a where clause or search appended to it.

That seems to work, but I am sure there are other approaches as well.

View solution in original post

0 Karma

JeffBothel
Explorer

I think one thing to note would be I am hoping to also do this against a table of domain values. I have a list of approved domains that I am looking to compare against but there are hundreds of them that we have approved and having them in a single search might be a bit difficult. Based on what you provided I am seeing that this is something similar to this programmatic way of doing this:

foreach(emailevent in searchresults){
        foreach(recipient in emailevent){
                if(recipient != listdomain){
                        discardemail = true
                }
        }
}

That's essentially what I a looking to accomplish with the Splunk search so that the emails with only approved domains can be returned for evaluation.

0 Karma

knielsen
Contributor

Now I think I nailed it.

I uploaded a .csv with this content:

allowed
domain1.com
domain2.com
domain4.com

and created a lookup called "allowed" with that (I should have better chosen names for lookup and field, I was confused by that later myself).

This is used in a query like this:

| makeresults | eval recipient="user@domain1.com,user@domain2.com-user@domain1.com-user@domain1.com,user@domain3.com-user@domain2.com-user1@domain2.com,user2@domain2.com-bla@domain4.com-bla@domain4.com,user@domain3.com-user@domain4.com,user@domain1.com" | makemv delim="-" recipient | mvexpand recipient | eval orig_to=recipient | makemv delim="," recipient 
| rex field=recipient "^[^@]+@(?<domains>.+)$"  | eval domains=mvdedup(domains) | lookup allowed allowed as domains OUTPUT allowed AS matched | eval result=if(mvcount(domains)=mvcount(matched),"included","excluded") | fields - domains matched

And the result looks fine to me, I hope I got all cases covered:

alt text

I went through some terrible iterations, but I hope this one is correct. 🙂

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

@knielsen - That code looks okay, but you can easily make it more performant. Since you don't need the domain separate, you can bypass the rex and code the mvfilter to go directly against recipient, into a new field. Add "@" at the beginning and an end-anchor "$" at the end of the mask to make the match more efficient.

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!