Splunk Search

Making a lookup optional? (Or, how to build a multi-level lookup?)

Lowell
Super Champion

I have a scenario where I would like to do a two-layered lookup. I'm essentially doing an IP address lookup against a table (.csv file) for a number of well known hosts, and then I want to do a full DNS lookup for any IP that fails to match against my first lookup.

There are a couple of reasons that push me to this 2-layered lookup approach:

  1. I don't have control over the DNS entires.
  2. The dns names must be returned consistently. (I have a handful of severs that return a rotating list of hostnames, which is not acceptable in this case.)
  3. Most of the IPs I'm dealing with are already known and are static.
  4. Splunk rocks, and should be able to do this. 😉

So basically I'm looking for the logic: Use the first lookup, and if it fails for a given event, then (and only then) use the secondary lookup.

What's the best way to accomplish this?

1 Solution

Lowell
Super Champion

Here is a the solution I came up with. (I welcome any feedback or alternate ideas.)

Here is my search:

sourcetype=my_sourcetype ip=*
[1]    | lookup local_dns ip OUTPUT hostname 
[2]    | eval clientip=if(isnull(hostname), ip, null())
[3]    | lookup dnslookup clientip OUTPUT clienthost
[4]    | eval hostname=coalesce(hostname, clienthost, ip)
[5]    | fields - clientip clienthost

Note: This search is broken in separate lines and numbered them for readability

Explication:

The goal of these search commands is to start with an ip field and "lookup" the best value for hostname using two different lookups.

Line [1] does the first lookup against a local .csv lookup file. If the lookup doesn't match then the hostname field remains missing for that event. So in [2], we assign a value to clientip only if the first lookup failed (e.g. when hostname is null). The second lookup [3] should only be preformed for events that have a value for clientip (Can anyone confirm this point?), this is the more expensive external lookup script that makes a bunch of socket.gethostbyaddr calls. So finally in [4] we can consolidate our values. We keep either hostname from the first lookup, or clienthost from the second lookup, or ip if both of the lookups failed. Then in [5] we simply remove our temporary fields.

View solution in original post

Lowell
Super Champion

Here is a the solution I came up with. (I welcome any feedback or alternate ideas.)

Here is my search:

sourcetype=my_sourcetype ip=*
[1]    | lookup local_dns ip OUTPUT hostname 
[2]    | eval clientip=if(isnull(hostname), ip, null())
[3]    | lookup dnslookup clientip OUTPUT clienthost
[4]    | eval hostname=coalesce(hostname, clienthost, ip)
[5]    | fields - clientip clienthost

Note: This search is broken in separate lines and numbered them for readability

Explication:

The goal of these search commands is to start with an ip field and "lookup" the best value for hostname using two different lookups.

Line [1] does the first lookup against a local .csv lookup file. If the lookup doesn't match then the hostname field remains missing for that event. So in [2], we assign a value to clientip only if the first lookup failed (e.g. when hostname is null). The second lookup [3] should only be preformed for events that have a value for clientip (Can anyone confirm this point?), this is the more expensive external lookup script that makes a bunch of socket.gethostbyaddr calls. So finally in [4] we can consolidate our values. We keep either hostname from the first lookup, or clienthost from the second lookup, or ip if both of the lookups failed. Then in [5] we simply remove our temporary fields.

Lowell
Super Champion

There are a couple of ips that don't resolve and running that before stats causes the search to really slow down. (The default bundled version of external_lookup.py preforms pretty poorly. I started writing a replacement, but I'm not sure where it got to... whoops.)

0 Karma

Lowell
Super Champion

I actually have it setup that way too. So the local_dns lookup is technically done twice. I bundled this sequence of commands into a macro which I use a few places and I call macro after a stats command (which only keeps ip and not the only partially-known hostname field) I just found it's easier that way and much more efficient to double up the execution of the file-based lookup rather than putting the dnslookup before stats (which results in multiple lookup executions and can lead to a single ip returning more than one host name. doh!)

0 Karma

jrodman
Splunk Employee
Splunk Employee

I'm sure you're aware it's possible to make the first lookup implicit. The rest will likely to have to work something like what you've cooked up

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...