Solved: Making a lookup optional? (Or, how to build a mul...

Lowell · ‎06-25-2010

I have a scenario where I would like to do a two-layered lookup. I'm essentially doing an IP address lookup against a table (.csv file) for a number of well known hosts, and then I want to do a full DNS lookup for any IP that fails to match against my first lookup.

There are a couple of reasons that push me to this 2-layered lookup approach:

I don't have control over the DNS entires.
The dns names must be returned consistently. (I have a handful of severs that return a rotating list of hostnames, which is not acceptable in this case.)
Most of the IPs I'm dealing with are already known and are static.
Splunk rocks, and should be able to do this. 😉

So basically I'm looking for the logic: Use the first lookup, and if it fails for a given event, then (and only then) use the secondary lookup.

What's the best way to accomplish this?

Lowell · ‎06-25-2010

Here is a the solution I came up with. (I welcome any feedback or alternate ideas.)

Here is my search:

sourcetype=my_sourcetype ip=*
[1]    | lookup local_dns ip OUTPUT hostname 
[2]    | eval clientip=if(isnull(hostname), ip, null())
[3]    | lookup dnslookup clientip OUTPUT clienthost
[4]    | eval hostname=coalesce(hostname, clienthost, ip)
[5]    | fields - clientip clienthost

Note: This search is broken in separate lines and numbered them for readability

Explication:

The goal of these search commands is to start with an ip field and "lookup" the best value for hostname using two different lookups.

Line [1] does the first lookup against a local .csv lookup file. If the lookup doesn't match then the hostname field remains missing for that event. So in [2], we assign a value to clientip only if the first lookup failed (e.g. when hostname is null). The second lookup [3] should only be preformed for events that have a value for clientip (Can anyone confirm this point?), this is the more expensive external lookup script that makes a bunch of socket.gethostbyaddr calls. So finally in [4] we can consolidate our values. We keep either hostname from the first lookup, or clienthost from the second lookup, or ip if both of the lookups failed. Then in [5] we simply remove our temporary fields.

View solution in original post

Lowell · ‎06-25-2010

Here is a the solution I came up with. (I welcome any feedback or alternate ideas.)

Here is my search:

sourcetype=my_sourcetype ip=*
[1]    | lookup local_dns ip OUTPUT hostname 
[2]    | eval clientip=if(isnull(hostname), ip, null())
[3]    | lookup dnslookup clientip OUTPUT clienthost
[4]    | eval hostname=coalesce(hostname, clienthost, ip)
[5]    | fields - clientip clienthost

Note: This search is broken in separate lines and numbered them for readability

Explication:

The goal of these search commands is to start with an ip field and "lookup" the best value for hostname using two different lookups.

Line [1] does the first lookup against a local .csv lookup file. If the lookup doesn't match then the hostname field remains missing for that event. So in [2], we assign a value to clientip only if the first lookup failed (e.g. when hostname is null). The second lookup [3] should only be preformed for events that have a value for clientip (Can anyone confirm this point?), this is the more expensive external lookup script that makes a bunch of socket.gethostbyaddr calls. So finally in [4] we can consolidate our values. We keep either hostname from the first lookup, or clienthost from the second lookup, or ip if both of the lookups failed. Then in [5] we simply remove our temporary fields.

Lowell · ‎06-25-2010

There are a couple of ips that don't resolve and running that before stats causes the search to really slow down. (The default bundled version of external_lookup.py preforms pretty poorly. I started writing a replacement, but I'm not sure where it got to... whoops.)

Lowell · ‎06-25-2010

I actually have it setup that way too. So the local_dns lookup is technically done twice. I bundled this sequence of commands into a macro which I use a few places and I call macro after a stats command (which only keeps ip and not the only partially-known hostname field) I just found it's easier that way and much more efficient to double up the execution of the file-based lookup rather than putting the dnslookup before stats (which results in multiple lookup executions and can lead to a single ip returning more than one host name. doh!)

jrodman · ‎06-25-2010

I'm sure you're aware it's possible to make the first lookup implicit. The rest will likely to have to work something like what you've cooked up

Making a lookup optional? (Or, how to build a multi-level lookup?)

Tech Talk Recap | Mastering Threat Hunting

Observability for AI Applications: Troubleshooting Latency

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

Are you a member of the Splunk Community?

Making a lookup optional? (Or, how to build a multi-level lookup?)

Tech Talk Recap | Mastering Threat Hunting

Observability for AI Applications: Troubleshooting Latency

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?