Archive

What is the basic difference between the lookup, inputlook and outputlookup commands

Path Finder

Good afternoon All,

I am having a hard time trying to understand the difference between "lookup", "inputlookup", and "outputlookup". I am also trying to get a basic real world example of why one may use one over the other. I am assuming that you first have to create the actual lookup file, which I have done from a static csv file that contains some malicious domains. I called this file badfile.csv.

My badfile.csv contains a field of "Domain" and let's say I am trying to search my "weblogs" sourcetype, and those logs also have the field name of "Domain". I know I need a common field in my lookup file that matches the sourcetype I am trying to search from, so a correlation can be made.

I am trying to figure out if I could use the "inputlookup" command to search for any hits or if I need to use the "lookup" command, or if I need to use a combination of both. Also, how would the outlookup command play into this?

I guess I am not sure what inputlook vs lookup does and am just looking for a more clear definition.

Any information that anyone can provide to give a basic understanding to a beginner is much appreciated.

Thanks

Tags (1)
1 Solution

Influencer

For reference: the docs have a page for each command: lookup inputlookup and outputlookup.

In short:

  • lookup adds data to each existing event in your result set based on a field existing in the event matching a value in the lookup
  • inputlookup takes the the table of the lookup and creates new events in your result set (either created completely or added to a prior result set)
  • outputlookup takes the current event set and writes it to a CSV or KVStore.

With your case there are two ways that I can think about this being done offhand, with certain tradeoffs: Assuming you have a lookup defined named baddomains with the field Domain one way to search would be:

sourcetype=weblogs [inputlookup baddomains] 

The subsearch would translate your lookup into the query ((Domain="bad.com") OR (Domain="bad.biz") ... ) and insert it into the parent search. Subsearches have limitations as far as number of rows and execution time, and you'll want to figure out if this makes sense or not.

If your lookup has another field say is_bad that has a "1" if a domain is bad... then your search could be:

sourcetype=weblogs | lookup baddomains Domain OUTPUT is_bad | where is_bad="1"

Additionally you can use props.conf to automatically do this lookup for all events in your sourcetype and then just search

sourcetype=weblogs is_bad=1

The tradeoff here in a Distributed environment where your search head is separate from your indexers is that the lookup either has to be distributed to your indexers (CSV, any changes means distributing the whole thing each time, which could be expensive depending on size of the lookup again... KVStore can distribute per key value pair, but takes some additional setup IIRC), or you have to stream all the events to the search head to perform the lookup with local=true which could be expensive depending on how much data you're talking about, and precludes the automatic lookup. You are also potentially reading a lot more data than is initially necessary.

Both ways can work, but a lot depends on the relative sizes of data and your environments.

View solution in original post

Influencer

For reference: the docs have a page for each command: lookup inputlookup and outputlookup.

In short:

  • lookup adds data to each existing event in your result set based on a field existing in the event matching a value in the lookup
  • inputlookup takes the the table of the lookup and creates new events in your result set (either created completely or added to a prior result set)
  • outputlookup takes the current event set and writes it to a CSV or KVStore.

With your case there are two ways that I can think about this being done offhand, with certain tradeoffs: Assuming you have a lookup defined named baddomains with the field Domain one way to search would be:

sourcetype=weblogs [inputlookup baddomains] 

The subsearch would translate your lookup into the query ((Domain="bad.com") OR (Domain="bad.biz") ... ) and insert it into the parent search. Subsearches have limitations as far as number of rows and execution time, and you'll want to figure out if this makes sense or not.

If your lookup has another field say is_bad that has a "1" if a domain is bad... then your search could be:

sourcetype=weblogs | lookup baddomains Domain OUTPUT is_bad | where is_bad="1"

Additionally you can use props.conf to automatically do this lookup for all events in your sourcetype and then just search

sourcetype=weblogs is_bad=1

The tradeoff here in a Distributed environment where your search head is separate from your indexers is that the lookup either has to be distributed to your indexers (CSV, any changes means distributing the whole thing each time, which could be expensive depending on size of the lookup again... KVStore can distribute per key value pair, but takes some additional setup IIRC), or you have to stream all the events to the search head to perform the lookup with local=true which could be expensive depending on how much data you're talking about, and precludes the automatic lookup. You are also potentially reading a lot more data than is initially necessary.

Both ways can work, but a lot depends on the relative sizes of data and your environments.

View solution in original post

Legend

I like this answer, but I want to point out one additional thing. IMO, you should always define a default to return if there is no match. In the UI, you will find it in the "advanced options" when you are setting up a CSV-based lookup.

If you set the default match to "unknown", you can use that to filter searches as well. This is particularly useful when you can't use the subsearch. For example, let's say that this [inputlookup banned_ips] fails because there are too many items in the lookup table. You could use the same table, and do this instead

sourcetype=weblogs | lookup banned_ips clientip OUTPUT status | where status="unknown"

This would only list the events where the clientip has NOT been banned. This is a variation of the "is_bad" example, but it can be easier to set up, depending on how you obtained your lookup table.

Tutorial on Setting Up a Lookup
Lookup field matching rules in transforms.conf

Path Finder

Thank so much for your answer. I have a better understanding. I really appreciate it.

0 Karma