- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How to use a CSV file lookup to filter out data?
Hello,
I'm trying to build a search against our DNS records, and I have a CSV file that contains a whitelist of domains that I want to filter out from the results. The format of the file is just:
company.com
I've been attempting to use lookup/inputlookup for this purpose, but it doesn't seem to be working (which is probably more a lack of Splunk skills on my part). But essentially what I'm trying to do is:
index=dns |
(if query field contains domain in whitelist_domains.csv then don't display it)
From what I've come across online, the topics are either using a csv file to only show the results that are in the file, or use the first column as the input and the second as the output. But nothing appears to cover what I'm trying to do; not including the results that are in the CSV file.
I'm also not sure if I have to do something like:
*.company.com
in the CSV file, since the queries are going to be more like: server.company.com
If I just do a search without the csv file:
index=dns | query!=*.company.com
It works as I expect it to, but then I have all these query!=.....
entries in the search box.
Thanks ahead of time....
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I like @chuckers suggestion that you have a second column, but for a different reason. My original answer works great if there aren't too many domains (less than a hundred). But if there are more, you need a different approach. Given a lookup CSV file like this
domain,flag
company.com,0
comp2.com,0
etc
Upload the file to Splunk and set up a lookup, which I will call domain_lookup. Be sure to set a "default" for the lookup of "Unknown" - this is what will be returned if there is no match. Then search like this
index=dns
| lookup domain_lookup domain OUTPUT flag | where flag = "Unknown"
This will eliminate any events that have the matching (white-listed) domains.This solution will work well no matter the number of white-listed domains. It will out-perform other solutions that use append or sub-searches.
Again, the above solution assumes that the field in the events is also called domain and matches the format of the lookup file. If the events in the dns index have a different format, you will need to insert steps in between lines 1 and 2, to create a field named domain that matches the lookup format.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried this approach, but now I'm not getting any results. Which makes me wonder if it's an issue with my csv file.
But all that's in it is query (or using your example domain) in the header row, and then I have a flag field.
Does splunk still take into account that the domain list I'm using won't be an exact match for the query field? So it's treating it as if I'm doing query!=*.domain.com ?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I am not sure where the "query" comes from. Is that the field in the DNS log that you want to test against? If yes, and the query field contains a DNS name that is more than just the domain, then you should probably do something like this:
index=dns
| eval domain = replace(query, ".*(.*?\..*?)$", "\1")
| lookup domain_lookup domain OUTPUT flag | where flag = "Unknown"
The second line should pick off the end of the query that represents the domain name and assign it to the domain field.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried that search and it still isn't giving me any results. But here's a sample of the event I'm trying to filter with looks like (I've removed irrelevant fields and changed names to protect the innocent)
sourcetype MSAD:NT6:DNS
context PACKET
dest SERVER
direction Rcv
eventtype nt6-dns-events(dns network resolution)
flags D
hexflags 0001
message a host address
message_type Query
message_type_code A
opcode Q
packetid 000000DFFFFFF
query www.microsoft.com
query_type Query
record_type A
reply_code NOERROR
reply_code_description No Error
reply_code_id 0
src 1.2.3.4
src_ip 1.2.3.4
tag dns
network resolution
threadid 0FFF
transport UDP
_time 2006-01-1T00:00:00.000+00:00
index dns
linecount 1
So that query field is what I'm trying to match against.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

First, a CSV file that is used for lookup must have a header row. Assuming that the header is simply "domain" - then the field name for the lookup is "domain" and the following will work
index=dns NOT [ inputlookup whitelist_domains.csv]
Note that the domain field in the CSV must exactly match the field name and the field format in order of this to work.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes it did have a header row. I've played around with it to see if maybe that's the problem. So currently it has "query" (without quotes) in the header row, which is the field I'm trying to match the domain against in the DNS event logs.
I've actually tried the example you suggested, and I'm still getting results that include the domains I'm trying to whitelist.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

This may be because the data in the dns index is more than just the domain name: perhaps it is something like: xyz.someserver.edu - while the lookup file only contains someserver.edu
Also, this search is case-sensitive. So "someserver.edu" will not match "someserver.EDU"
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes that's exactly the problem I'm dealing with. The query line will be something like
query="server.domain.com" (without quotes)
and I'm trying to just filter on "domain.com"
Essentially what I'm trying to do is a query!=*.company.com but pull the "company.com" part from the csv file.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This might be kind of clunky as I am a bit of a beginner as well...
I suggest adding a second column to your CSV file (I recall the documentation saying that a csv file must have at least 2 columns anyway.). Like this
DOMAIN, FLAG
company.com,0
comp2.com,0
comp3.com,0
The value in your new, second column is the same for every row (in this example "zero")
| inputlookup whitelist_domains.csv | rename DOMAIN AS domain
| append
[search yourQueryHere | stats count BY domain]
| stats sum(count) AS Count, values(FLAG) AS Flag BY domain
| where isnull(Flag)
This will give you a list of the domain that are not in your whitelist (along with a count of events if you care)
