<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Inputlookup against a list of bad domains in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Inputlookup-against-a-list-of-bad-domains/m-p/13406#M1242</link>
    <description>&lt;P&gt;I have a question on doing a inputlookup, and cant figure out where my point of failure is
I have a csv file located on my splunk server in&lt;BR /&gt;
$SPLUNK_HOME/etc/apps/search/lookups/&lt;BR /&gt;
$SPLUNK_HOME/etc/apps/lookups/&lt;BR /&gt;
$SPLUNK_HOME/etc/system/lookups  &lt;/P&gt;

&lt;P&gt;This CSV contains a list of bad domains I'd like to search on.&lt;BR /&gt;
It is quite large...over 5000 entries and a single column. CSV contents are like this:&lt;BR /&gt;
domain&lt;BR /&gt;
&lt;A href="https://community.splunk.com/www.somedomain.com" target="test_blank"&gt;www.somedomain.com&lt;/A&gt;&lt;BR /&gt;
&lt;A href="https://community.splunk.com/www.somedomain2.com" target="test_blank"&gt;www.somedomain2.com&lt;/A&gt;  &lt;/P&gt;

&lt;P&gt;Here is the search I try to do  &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index="someindex" [ inputlookup mal_domains.csv | fields domain | format ] 
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I've also tried adding another column in the csv with two columns like  &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;domain, status  
&lt;A href="https://community.splunk.com/www.somedomain.com" target="test_blank"&gt;www.somedomain.com&lt;/A&gt;, bad  

index="someindex" [ inputlookup mal_domains.csv | fields domain, status | format ]
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Still not getting any results, any ideas where this lookup is failing/or if my syntax is off in some way?&lt;/P&gt;</description>
    <pubDate>Thu, 13 May 2010 05:15:29 GMT</pubDate>
    <dc:creator>Chris_R_</dc:creator>
    <dc:date>2010-05-13T05:15:29Z</dc:date>
    <item>
      <title>Inputlookup against a list of bad domains</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Inputlookup-against-a-list-of-bad-domains/m-p/13406#M1242</link>
      <description>&lt;P&gt;I have a question on doing a inputlookup, and cant figure out where my point of failure is
I have a csv file located on my splunk server in&lt;BR /&gt;
$SPLUNK_HOME/etc/apps/search/lookups/&lt;BR /&gt;
$SPLUNK_HOME/etc/apps/lookups/&lt;BR /&gt;
$SPLUNK_HOME/etc/system/lookups  &lt;/P&gt;

&lt;P&gt;This CSV contains a list of bad domains I'd like to search on.&lt;BR /&gt;
It is quite large...over 5000 entries and a single column. CSV contents are like this:&lt;BR /&gt;
domain&lt;BR /&gt;
&lt;A href="https://community.splunk.com/www.somedomain.com" target="test_blank"&gt;www.somedomain.com&lt;/A&gt;&lt;BR /&gt;
&lt;A href="https://community.splunk.com/www.somedomain2.com" target="test_blank"&gt;www.somedomain2.com&lt;/A&gt;  &lt;/P&gt;

&lt;P&gt;Here is the search I try to do  &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index="someindex" [ inputlookup mal_domains.csv | fields domain | format ] 
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I've also tried adding another column in the csv with two columns like  &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;domain, status  
&lt;A href="https://community.splunk.com/www.somedomain.com" target="test_blank"&gt;www.somedomain.com&lt;/A&gt;, bad  

index="someindex" [ inputlookup mal_domains.csv | fields domain, status | format ]
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Still not getting any results, any ideas where this lookup is failing/or if my syntax is off in some way?&lt;/P&gt;</description>
      <pubDate>Thu, 13 May 2010 05:15:29 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Inputlookup-against-a-list-of-bad-domains/m-p/13406#M1242</guid>
      <dc:creator>Chris_R_</dc:creator>
      <dc:date>2010-05-13T05:15:29Z</dc:date>
    </item>
    <item>
      <title>Re: Inputlookup against a list of bad domains</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Inputlookup-against-a-list-of-bad-domains/m-p/13407#M1243</link>
      <description>&lt;P&gt;In my testing, I think the point i am missing is the "domain" and "status" fields need to exist as either search time or indexed fields for the inputlookup to run against?&lt;/P&gt;</description>
      <pubDate>Thu, 13 May 2010 06:14:46 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Inputlookup-against-a-list-of-bad-domains/m-p/13407#M1243</guid>
      <dc:creator>Chris_R_</dc:creator>
      <dc:date>2010-05-13T06:14:46Z</dc:date>
    </item>
    <item>
      <title>Re: Inputlookup against a list of bad domains</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Inputlookup-against-a-list-of-bad-domains/m-p/13408#M1244</link>
      <description>&lt;P&gt;I think you're on the right path.  Hopefully we can get you closer.  &lt;/P&gt;

&lt;P&gt;The $SPLUNK_HOME/etc/apps/lookups/ path is ignored.  Either of the other two directories should work though.  The two files will not merge - one will triumph.  &lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;Step 1&lt;/STRONG&gt;&lt;BR /&gt;
We can first check the inputlookup is working by verifying Splunk uses it okay.  We can enter this search as a test.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; |inputlookup mal_domains.csv
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Let us say it returns a results table having names &lt;EM&gt;domain&lt;/EM&gt; and &lt;EM&gt;status&lt;/EM&gt; per your example.  If not, the CSV is not being found or read appropriately.&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;Step 2&lt;/STRONG&gt;&lt;BR /&gt;
From your example &amp;amp; comment, I don't think you have any fields that relate to "bad" so let's get rid of that column&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;|inputlookup mal_domains.csv | fields + domain
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Hopefully this is returning a results table having just the &lt;EM&gt;domain&lt;/EM&gt; field.&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;Step 3&lt;/STRONG&gt;&lt;BR /&gt;
Let's say your Splunk data doesn't have a field called &lt;EM&gt;domain&lt;/EM&gt; though.  Instead, what if it is &lt;EM&gt;URL&lt;/EM&gt;?  Let's prepare to work against the &lt;EM&gt;URL&lt;/EM&gt; field&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;|inputlookup mal_domains.csv | rename domain as URL | fields + URL
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Hopefully the results table now looks like it did in Step 2, except with a &lt;EM&gt;URL&lt;/EM&gt; field instead of &lt;EM&gt;domain&lt;/EM&gt; field.&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;Step 4&lt;/STRONG&gt;&lt;BR /&gt;
Try to find Splunk data having a &lt;EM&gt;URL&lt;/EM&gt; value matching a &lt;EM&gt;domain&lt;/EM&gt; value from the mal_domains.csv file&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;* [|inputlookup mal_domains.csv | rename domain as URL | fields + URL]
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 13 May 2010 10:15:32 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Inputlookup-against-a-list-of-bad-domains/m-p/13408#M1244</guid>
      <dc:creator>bwooden</dc:creator>
      <dc:date>2010-05-13T10:15:32Z</dc:date>
    </item>
    <item>
      <title>Re: Inputlookup against a list of bad domains</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Inputlookup-against-a-list-of-bad-domains/m-p/13409#M1245</link>
      <description>&lt;P&gt;Here's a tricky (and efficient) way of doing it:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index=someindex | append [ inputlookup mal_domains.csv | eval index=anotherindex ] | stats count by domain,index | stats count by domain | where count &amp;gt; 1
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 13 May 2010 14:19:03 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Inputlookup-against-a-list-of-bad-domains/m-p/13409#M1245</guid>
      <dc:creator>gkanapathy</dc:creator>
      <dc:date>2010-05-13T14:19:03Z</dc:date>
    </item>
    <item>
      <title>Re: Inputlookup against a list of bad domains</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Inputlookup-against-a-list-of-bad-domains/m-p/13410#M1246</link>
      <description>&lt;P&gt;Thanks bwooden, makes a bit more sense now.&lt;/P&gt;</description>
      <pubDate>Fri, 14 May 2010 00:39:45 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Inputlookup-against-a-list-of-bad-domains/m-p/13410#M1246</guid>
      <dc:creator>Chris_R_</dc:creator>
      <dc:date>2010-05-14T00:39:45Z</dc:date>
    </item>
    <item>
      <title>Re: Inputlookup against a list of bad domains</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Inputlookup-against-a-list-of-bad-domains/m-p/13411#M1247</link>
      <description>&lt;P&gt;I know this is a REALLY old question, but I had to implement something similar just now and thought I'd share a tip. Assuming that you're trying to just find any events that have a URL matching what's in the lookup table, I noticed a HUGE performance difference between the last example as given above vs using a join. For example, I was first trying:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;sourcetype=dns [|inputlookup dns_watchlist.csv]
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;This took a considerable time to run, vs when I just did a search for sourcetype=dns over the same timerange. Looks like it was basically running the entire lookup table against each result. Since I only want results for events that contain something in the lookup table, I changed it to use a join:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;sourcetype=dns | join query_value [|inputlookup dns_watchlist.csv]
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;This has a completely negligible impact on performance compared to the straight sourcetype=dns search.&lt;/P&gt;

&lt;P&gt;Hope that maybe helps someone else looking to do something similar.&lt;/P&gt;</description>
      <pubDate>Thu, 05 May 2011 20:40:16 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Inputlookup-against-a-list-of-bad-domains/m-p/13411#M1247</guid>
      <dc:creator>tmeader</dc:creator>
      <dc:date>2011-05-05T20:40:16Z</dc:date>
    </item>
    <item>
      <title>Re: Inputlookup against a list of bad domains</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Inputlookup-against-a-list-of-bad-domains/m-p/13412#M1248</link>
      <description>&lt;P&gt;I suppose that's possible if sourcetype=dns doesn't have very many events and the dns_watchlist.csv is long, exposing inefficiencies when there are too many search tersm.  But the subsearch should faster than join 99%+ of the time.&lt;/P&gt;</description>
      <pubDate>Wed, 30 May 2012 20:23:51 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Inputlookup-against-a-list-of-bad-domains/m-p/13412#M1248</guid>
      <dc:creator>carasso</dc:creator>
      <dc:date>2012-05-30T20:23:51Z</dc:date>
    </item>
    <item>
      <title>Re: Inputlookup against a list of bad domains</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Inputlookup-against-a-list-of-bad-domains/m-p/13413#M1249</link>
      <description>&lt;P&gt;This does not seem very efficient to me. If your bad domains will only occur in a low percentage of answers, you're reading all those event from disk that will eventually be trashed at the end. Compare that to bwooden's answer, where you're acually searching for and returning only those events that contain the bad domain. In my experience, I/O is the most common limiting factor for Splunk&lt;/P&gt;

&lt;P&gt;Additionally, at the end of this query, all you'd have are the domains that alerted. In my cases, I usually want to see the whole event.&lt;/P&gt;</description>
      <pubDate>Fri, 25 Oct 2013 19:59:28 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Inputlookup-against-a-list-of-bad-domains/m-p/13413#M1249</guid>
      <dc:creator>supersleepwalke</dc:creator>
      <dc:date>2013-10-25T19:59:28Z</dc:date>
    </item>
    <item>
      <title>Re: Inputlookup against a list of bad domains</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Inputlookup-against-a-list-of-bad-domains/m-p/13414#M1250</link>
      <description>&lt;P&gt;Very clear, you help me cure my headache &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;Thanks a lot !&lt;/P&gt;</description>
      <pubDate>Thu, 07 Aug 2014 18:53:34 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Inputlookup-against-a-list-of-bad-domains/m-p/13414#M1250</guid>
      <dc:creator>gnoellbn</dc:creator>
      <dc:date>2014-08-07T18:53:34Z</dc:date>
    </item>
  </channel>
</rss>

