Splunk Search

How to use a CSV file lookup to filter out data?

New Member

Hello,
I'm trying to build a search against our DNS records, and I have a CSV file that contains a whitelist of domains that I want to filter out from the results. The format of the file is just:

company.com

I've been attempting to use lookup/inputlookup for this purpose, but it doesn't seem to be working (which is probably more a lack of Splunk skills on my part). But essentially what I'm trying to do is:

index=dns | (if query field contains domain in whitelist_domains.csv then don't display it)

From what I've come across online, the topics are either using a csv file to only show the results that are in the file, or use the first column as the input and the second as the output. But nothing appears to cover what I'm trying to do; not including the results that are in the CSV file.

I'm also not sure if I have to do something like:

*.company.com

in the CSV file, since the queries are going to be more like: server.company.com

If I just do a search without the csv file:

index=dns | query!=*.company.com

It works as I expect it to, but then I have all these query!=..... entries in the search box.

Thanks ahead of time....

0 Karma

Legend

I like @chuckers suggestion that you have a second column, but for a different reason. My original answer works great if there aren't too many domains (less than a hundred). But if there are more, you need a different approach. Given a lookup CSV file like this

domain,flag
company.com,0
comp2.com,0
etc

Upload the file to Splunk and set up a lookup, which I will call domain_lookup. Be sure to set a "default" for the lookup of "Unknown" - this is what will be returned if there is no match. Then search like this

index=dns
| lookup domain_lookup domain OUTPUT flag | where flag = "Unknown"

This will eliminate any events that have the matching (white-listed) domains.This solution will work well no matter the number of white-listed domains. It will out-perform other solutions that use append or sub-searches.

Again, the above solution assumes that the field in the events is also called domain and matches the format of the lookup file. If the events in the dns index have a different format, you will need to insert steps in between lines 1 and 2, to create a field named domain that matches the lookup format.

New Member

I tried this approach, but now I'm not getting any results. Which makes me wonder if it's an issue with my csv file.

But all that's in it is query (or using your example domain) in the header row, and then I have a flag field.

Does splunk still take into account that the domain list I'm using won't be an exact match for the query field? So it's treating it as if I'm doing query!=*.domain.com ?

0 Karma

Legend

I am not sure where the "query" comes from. Is that the field in the DNS log that you want to test against? If yes, and the query field contains a DNS name that is more than just the domain, then you should probably do something like this:

 index=dns
 | eval domain = replace(query, ".*(.*?\..*?)$", "\1")
 | lookup domain_lookup domain OUTPUT flag | where flag = "Unknown"

The second line should pick off the end of the query that represents the domain name and assign it to the domain field.

0 Karma

New Member

I tried that search and it still isn't giving me any results. But here's a sample of the event I'm trying to filter with looks like (I've removed irrelevant fields and changed names to protect the innocent)

sourcetype MSAD:NT6:DNS

context PACKET

dest SERVER
direction Rcv
eventtype nt6-dns-events(dns network resolution)

flags D

hexflags 0001

message a host address

message_type Query

message_type_code A

opcode Q

packetid 000000DFFFFFF

query www.microsoft.com

query_type Query

record_type A

reply_code NOERROR
reply_code_description No Error

reply_code_id 0

src 1.2.3.4
src_ip 1.2.3.4
tag dns
network resolution

threadid 0FFF

transport UDP
_time 2006-01-1T00:00:00.000+00:00

index dns
linecount 1

So that query field is what I'm trying to match against.

0 Karma

Legend

First, a CSV file that is used for lookup must have a header row. Assuming that the header is simply "domain" - then the field name for the lookup is "domain" and the following will work

index=dns NOT [ inputlookup  whitelist_domains.csv]

Note that the domain field in the CSV must exactly match the field name and the field format in order of this to work.

New Member

Yes it did have a header row. I've played around with it to see if maybe that's the problem. So currently it has "query" (without quotes) in the header row, which is the field I'm trying to match the domain against in the DNS event logs.

I've actually tried the example you suggested, and I'm still getting results that include the domains I'm trying to whitelist.

0 Karma

Legend

This may be because the data in the dns index is more than just the domain name: perhaps it is something like: xyz.someserver.edu - while the lookup file only contains someserver.edu

Also, this search is case-sensitive. So "someserver.edu" will not match "someserver.EDU"

0 Karma

New Member

Yes that's exactly the problem I'm dealing with. The query line will be something like

query="server.domain.com" (without quotes)

and I'm trying to just filter on "domain.com"

Essentially what I'm trying to do is a query!=*.company.com but pull the "company.com" part from the csv file.

0 Karma

Path Finder

This might be kind of clunky as I am a bit of a beginner as well...

I suggest adding a second column to your CSV file (I recall the documentation saying that a csv file must have at least 2 columns anyway.). Like this
DOMAIN, FLAG
company.com,0
comp2.com,0
comp3.com,0

The value in your new, second column is the same for every row (in this example "zero")

| inputlookup whitelist_domains.csv | rename DOMAIN AS domain

| append
[search yourQueryHere | stats count BY domain]

| stats sum(count) AS Count, values(FLAG) AS Flag BY domain
| where isnull(Flag)

This will give you a list of the domain that are not in your whitelist (along with a count of events if you care)

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!