I'm trying to build a search against our DNS records, and I have a CSV file that contains a whitelist of domains that I want to filter out from the results. The format of the file is just:
I've been attempting to use lookup/inputlookup for this purpose, but it doesn't seem to be working (which is probably more a lack of Splunk skills on my part). But essentially what I'm trying to do is:
index=dns | (if query field contains domain in whitelist_domains.csv then don't display it)
From what I've come across online, the topics are either using a csv file to only show the results that are in the file, or use the first column as the input and the second as the output. But nothing appears to cover what I'm trying to do; not including the results that are in the CSV file.
I'm also not sure if I have to do something like:
in the CSV file, since the queries are going to be more like: server.company.com
If I just do a search without the csv file:
index=dns | query!=*.company.com
It works as I expect it to, but then I have all these
query!=..... entries in the search box.
Thanks ahead of time....
I like @chuckers suggestion that you have a second column, but for a different reason. My original answer works great if there aren't too many domains (less than a hundred). But if there are more, you need a different approach. Given a lookup CSV file like this
Upload the file to Splunk and set up a lookup, which I will call domain_lookup. Be sure to set a "default" for the lookup of "Unknown" - this is what will be returned if there is no match. Then search like this
index=dns | lookup domain_lookup domain OUTPUT flag | where flag = "Unknown"
This will eliminate any events that have the matching (white-listed) domains.This solution will work well no matter the number of white-listed domains. It will out-perform other solutions that use append or sub-searches.
Again, the above solution assumes that the field in the events is also called domain and matches the format of the lookup file. If the events in the dns index have a different format, you will need to insert steps in between lines 1 and 2, to create a field named domain that matches the lookup format.
I tried this approach, but now I'm not getting any results. Which makes me wonder if it's an issue with my csv file.
But all that's in it is query (or using your example domain) in the header row, and then I have a flag field.
Does splunk still take into account that the domain list I'm using won't be an exact match for the query field? So it's treating it as if I'm doing query!=*.domain.com ?
I am not sure where the "query" comes from. Is that the field in the DNS log that you want to test against? If yes, and the query field contains a DNS name that is more than just the domain, then you should probably do something like this:
index=dns | eval domain = replace(query, ".*(.*?\..*?)$", "\1") | lookup domain_lookup domain OUTPUT flag | where flag = "Unknown"
The second line should pick off the end of the query that represents the domain name and assign it to the domain field.
I tried that search and it still isn't giving me any results. But here's a sample of the event I'm trying to filter with looks like (I've removed irrelevant fields and changed names to protect the innocent)
eventtype nt6-dns-events(dns network resolution)
message a host address
reply_code_description No Error
So that query field is what I'm trying to match against.
First, a CSV file that is used for lookup must have a header row. Assuming that the header is simply "domain" - then the field name for the lookup is "domain" and the following will work
index=dns NOT [ inputlookup whitelist_domains.csv]
Note that the domain field in the CSV must exactly match the field name and the field format in order of this to work.
Yes it did have a header row. I've played around with it to see if maybe that's the problem. So currently it has "query" (without quotes) in the header row, which is the field I'm trying to match the domain against in the DNS event logs.
I've actually tried the example you suggested, and I'm still getting results that include the domains I'm trying to whitelist.
This may be because the data in the dns index is more than just the domain name: perhaps it is something like: xyz.someserver.edu - while the lookup file only contains someserver.edu
Also, this search is case-sensitive. So "someserver.edu" will not match "someserver.EDU"
Yes that's exactly the problem I'm dealing with. The query line will be something like
query="server.domain.com" (without quotes)
and I'm trying to just filter on "domain.com"
Essentially what I'm trying to do is a query!=*.company.com but pull the "company.com" part from the csv file.
This might be kind of clunky as I am a bit of a beginner as well...
I suggest adding a second column to your CSV file (I recall the documentation saying that a csv file must have at least 2 columns anyway.). Like this
The value in your new, second column is the same for every row (in this example "zero")
| inputlookup whitelist_domains.csv | rename DOMAIN AS domain
[search yourQueryHere | stats count BY domain]
| stats sum(count) AS Count, values(FLAG) AS Flag BY domain
| where isnull(Flag)
This will give you a list of the domain that are not in your whitelist (along with a count of events if you care)