Splunk Search

Inputlookup against a list of bad domains

Splunk Employee
Splunk Employee

I have a question on doing a inputlookup, and cant figure out where my point of failure is I have a csv file located on my splunk server in
$SPLUNK_HOME/etc/apps/search/lookups/
$SPLUNK_HOME/etc/apps/lookups/
$SPLUNK_HOME/etc/system/lookups

This CSV contains a list of bad domains I'd like to search on.
It is quite large...over 5000 entries and a single column. CSV contents are like this:
domain
www.somedomain.com
www.somedomain2.com

Here is the search I try to do

index="someindex" [ inputlookup mal_domains.csv | fields domain | format ] 

I've also tried adding another column in the csv with two columns like

domain, status  
www.somedomain.com, bad  

index="someindex" [ inputlookup mal_domains.csv | fields domain, status | format ]

Still not getting any results, any ideas where this lookup is failing/or if my syntax is off in some way?

Tags (2)
1 Solution

Splunk Employee
Splunk Employee

I think you're on the right path. Hopefully we can get you closer.

The $SPLUNK_HOME/etc/apps/lookups/ path is ignored. Either of the other two directories should work though. The two files will not merge - one will triumph.

Step 1
We can first check the inputlookup is working by verifying Splunk uses it okay. We can enter this search as a test.

 |inputlookup mal_domains.csv

Let us say it returns a results table having names domain and status per your example. If not, the CSV is not being found or read appropriately.

Step 2
From your example & comment, I don't think you have any fields that relate to "bad" so let's get rid of that column

|inputlookup mal_domains.csv | fields + domain

Hopefully this is returning a results table having just the domain field.

Step 3
Let's say your Splunk data doesn't have a field called domain though. Instead, what if it is URL? Let's prepare to work against the URL field

|inputlookup mal_domains.csv | rename domain as URL | fields + URL

Hopefully the results table now looks like it did in Step 2, except with a URL field instead of domain field.

Step 4
Try to find Splunk data having a URL value matching a domain value from the mal_domains.csv file

* [|inputlookup mal_domains.csv | rename domain as URL | fields + URL]

View solution in original post

Contributor

I know this is a REALLY old question, but I had to implement something similar just now and thought I'd share a tip. Assuming that you're trying to just find any events that have a URL matching what's in the lookup table, I noticed a HUGE performance difference between the last example as given above vs using a join. For example, I was first trying:

sourcetype=dns [|inputlookup dns_watchlist.csv]

This took a considerable time to run, vs when I just did a search for sourcetype=dns over the same timerange. Looks like it was basically running the entire lookup table against each result. Since I only want results for events that contain something in the lookup table, I changed it to use a join:

sourcetype=dns | join query_value [|inputlookup dns_watchlist.csv]

This has a completely negligible impact on performance compared to the straight sourcetype=dns search.

Hope that maybe helps someone else looking to do something similar.

Splunk Employee
Splunk Employee

I suppose that's possible if sourcetype=dns doesn't have very many events and the dns_watchlist.csv is long, exposing inefficiencies when there are too many search tersm. But the subsearch should faster than join 99%+ of the time.

0 Karma

Splunk Employee
Splunk Employee

Here's a tricky (and efficient) way of doing it:

index=someindex | append [ inputlookup mal_domains.csv | eval index=anotherindex ] | stats count by domain,index | stats count by domain | where count > 1

Communicator

This does not seem very efficient to me. If your bad domains will only occur in a low percentage of answers, you're reading all those event from disk that will eventually be trashed at the end. Compare that to bwooden's answer, where you're acually searching for and returning only those events that contain the bad domain. In my experience, I/O is the most common limiting factor for Splunk

Additionally, at the end of this query, all you'd have are the domains that alerted. In my cases, I usually want to see the whole event.

0 Karma

Splunk Employee
Splunk Employee

I think you're on the right path. Hopefully we can get you closer.

The $SPLUNK_HOME/etc/apps/lookups/ path is ignored. Either of the other two directories should work though. The two files will not merge - one will triumph.

Step 1
We can first check the inputlookup is working by verifying Splunk uses it okay. We can enter this search as a test.

 |inputlookup mal_domains.csv

Let us say it returns a results table having names domain and status per your example. If not, the CSV is not being found or read appropriately.

Step 2
From your example & comment, I don't think you have any fields that relate to "bad" so let's get rid of that column

|inputlookup mal_domains.csv | fields + domain

Hopefully this is returning a results table having just the domain field.

Step 3
Let's say your Splunk data doesn't have a field called domain though. Instead, what if it is URL? Let's prepare to work against the URL field

|inputlookup mal_domains.csv | rename domain as URL | fields + URL

Hopefully the results table now looks like it did in Step 2, except with a URL field instead of domain field.

Step 4
Try to find Splunk data having a URL value matching a domain value from the mal_domains.csv file

* [|inputlookup mal_domains.csv | rename domain as URL | fields + URL]

View solution in original post

Explorer

Very clear, you help me cure my headache 🙂

Thanks a lot !

0 Karma

Splunk Employee
Splunk Employee

Thanks bwooden, makes a bit more sense now.

0 Karma

Splunk Employee
Splunk Employee

In my testing, I think the point i am missing is the "domain" and "status" fields need to exist as either search time or indexed fields for the inputlookup to run against?