Splunk Search

Inputlookup against a list of bad domains

Chris_R_
Splunk Employee
Splunk Employee

I have a question on doing a inputlookup, and cant figure out where my point of failure is I have a csv file located on my splunk server in
$SPLUNK_HOME/etc/apps/search/lookups/
$SPLUNK_HOME/etc/apps/lookups/
$SPLUNK_HOME/etc/system/lookups

This CSV contains a list of bad domains I'd like to search on.
It is quite large...over 5000 entries and a single column. CSV contents are like this:
domain
www.somedomain.com
www.somedomain2.com

Here is the search I try to do

index="someindex" [ inputlookup mal_domains.csv | fields domain | format ] 

I've also tried adding another column in the csv with two columns like

domain, status  
www.somedomain.com, bad  

index="someindex" [ inputlookup mal_domains.csv | fields domain, status | format ]

Still not getting any results, any ideas where this lookup is failing/or if my syntax is off in some way?

Tags (2)
1 Solution

bwooden
Splunk Employee
Splunk Employee

I think you're on the right path. Hopefully we can get you closer.

The $SPLUNK_HOME/etc/apps/lookups/ path is ignored. Either of the other two directories should work though. The two files will not merge - one will triumph.

Step 1
We can first check the inputlookup is working by verifying Splunk uses it okay. We can enter this search as a test.

 |inputlookup mal_domains.csv

Let us say it returns a results table having names domain and status per your example. If not, the CSV is not being found or read appropriately.

Step 2
From your example & comment, I don't think you have any fields that relate to "bad" so let's get rid of that column

|inputlookup mal_domains.csv | fields + domain

Hopefully this is returning a results table having just the domain field.

Step 3
Let's say your Splunk data doesn't have a field called domain though. Instead, what if it is URL? Let's prepare to work against the URL field

|inputlookup mal_domains.csv | rename domain as URL | fields + URL

Hopefully the results table now looks like it did in Step 2, except with a URL field instead of domain field.

Step 4
Try to find Splunk data having a URL value matching a domain value from the mal_domains.csv file

* [|inputlookup mal_domains.csv | rename domain as URL | fields + URL]

View solution in original post

tmeader
Contributor

I know this is a REALLY old question, but I had to implement something similar just now and thought I'd share a tip. Assuming that you're trying to just find any events that have a URL matching what's in the lookup table, I noticed a HUGE performance difference between the last example as given above vs using a join. For example, I was first trying:

sourcetype=dns [|inputlookup dns_watchlist.csv]

This took a considerable time to run, vs when I just did a search for sourcetype=dns over the same timerange. Looks like it was basically running the entire lookup table against each result. Since I only want results for events that contain something in the lookup table, I changed it to use a join:

sourcetype=dns | join query_value [|inputlookup dns_watchlist.csv]

This has a completely negligible impact on performance compared to the straight sourcetype=dns search.

Hope that maybe helps someone else looking to do something similar.

carasso
Splunk Employee
Splunk Employee

I suppose that's possible if sourcetype=dns doesn't have very many events and the dns_watchlist.csv is long, exposing inefficiencies when there are too many search tersm. But the subsearch should faster than join 99%+ of the time.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Here's a tricky (and efficient) way of doing it:

index=someindex | append [ inputlookup mal_domains.csv | eval index=anotherindex ] | stats count by domain,index | stats count by domain | where count > 1

supersleepwalke
Communicator

This does not seem very efficient to me. If your bad domains will only occur in a low percentage of answers, you're reading all those event from disk that will eventually be trashed at the end. Compare that to bwooden's answer, where you're acually searching for and returning only those events that contain the bad domain. In my experience, I/O is the most common limiting factor for Splunk

Additionally, at the end of this query, all you'd have are the domains that alerted. In my cases, I usually want to see the whole event.

0 Karma

bwooden
Splunk Employee
Splunk Employee

I think you're on the right path. Hopefully we can get you closer.

The $SPLUNK_HOME/etc/apps/lookups/ path is ignored. Either of the other two directories should work though. The two files will not merge - one will triumph.

Step 1
We can first check the inputlookup is working by verifying Splunk uses it okay. We can enter this search as a test.

 |inputlookup mal_domains.csv

Let us say it returns a results table having names domain and status per your example. If not, the CSV is not being found or read appropriately.

Step 2
From your example & comment, I don't think you have any fields that relate to "bad" so let's get rid of that column

|inputlookup mal_domains.csv | fields + domain

Hopefully this is returning a results table having just the domain field.

Step 3
Let's say your Splunk data doesn't have a field called domain though. Instead, what if it is URL? Let's prepare to work against the URL field

|inputlookup mal_domains.csv | rename domain as URL | fields + URL

Hopefully the results table now looks like it did in Step 2, except with a URL field instead of domain field.

Step 4
Try to find Splunk data having a URL value matching a domain value from the mal_domains.csv file

* [|inputlookup mal_domains.csv | rename domain as URL | fields + URL]

gnoellbn
Explorer

Very clear, you help me cure my headache 🙂

Thanks a lot !

0 Karma

Chris_R_
Splunk Employee
Splunk Employee

Thanks bwooden, makes a bit more sense now.

0 Karma

Chris_R_
Splunk Employee
Splunk Employee

In my testing, I think the point i am missing is the "domain" and "status" fields need to exist as either search time or indexed fields for the inputlookup to run against?

Get Updates on the Splunk Community!

Monitoring MariaDB and MySQL

In a previous post, we explored monitoring PostgreSQL and general best practices around which metrics to ...

Financial Services Industry Use Cases, ITSI Best Practices, and More New Articles ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Splunk Federated Analytics for Amazon Security Lake

Thursday, November 21, 2024  |  11AM PT / 2PM ET Register Now Join our session to see the technical ...