I have a scenario where I would like to do a two-layered lookup. I'm essentially doing an IP address lookup against a table (.csv
file) for a number of well known hosts, and then I want to do a full DNS lookup for any IP that fails to match against my first lookup.
There are a couple of reasons that push me to this 2-layered lookup approach:
So basically I'm looking for the logic: Use the first lookup, and if it fails for a given event, then (and only then) use the secondary lookup.
What's the best way to accomplish this?
Here is a the solution I came up with. (I welcome any feedback or alternate ideas.)
Here is my search:
sourcetype=my_sourcetype ip=*
[1] | lookup local_dns ip OUTPUT hostname
[2] | eval clientip=if(isnull(hostname), ip, null())
[3] | lookup dnslookup clientip OUTPUT clienthost
[4] | eval hostname=coalesce(hostname, clienthost, ip)
[5] | fields - clientip clienthost
Note: This search is broken in separate lines and numbered them for readability
Explication:
The goal of these search commands is to start with an ip
field and "lookup" the best value for hostname
using two different lookups.
Line [1]
does the first lookup against a local .csv
lookup file. If the lookup doesn't match then the hostname
field remains missing for that event. So in [2]
, we assign a value to clientip
only if the first lookup failed (e.g. when hostname
is null). The second lookup [3]
should only be preformed for events that have a value for clientip
(Can anyone confirm this point?), this is the more expensive external lookup script that makes a bunch of socket.gethostbyaddr
calls. So finally in [4]
we can consolidate our values. We keep either hostname
from the first lookup, or clienthost
from the second lookup, or ip
if both of the lookups failed. Then in [5]
we simply remove our temporary fields.
Here is a the solution I came up with. (I welcome any feedback or alternate ideas.)
Here is my search:
sourcetype=my_sourcetype ip=*
[1] | lookup local_dns ip OUTPUT hostname
[2] | eval clientip=if(isnull(hostname), ip, null())
[3] | lookup dnslookup clientip OUTPUT clienthost
[4] | eval hostname=coalesce(hostname, clienthost, ip)
[5] | fields - clientip clienthost
Note: This search is broken in separate lines and numbered them for readability
Explication:
The goal of these search commands is to start with an ip
field and "lookup" the best value for hostname
using two different lookups.
Line [1]
does the first lookup against a local .csv
lookup file. If the lookup doesn't match then the hostname
field remains missing for that event. So in [2]
, we assign a value to clientip
only if the first lookup failed (e.g. when hostname
is null). The second lookup [3]
should only be preformed for events that have a value for clientip
(Can anyone confirm this point?), this is the more expensive external lookup script that makes a bunch of socket.gethostbyaddr
calls. So finally in [4]
we can consolidate our values. We keep either hostname
from the first lookup, or clienthost
from the second lookup, or ip
if both of the lookups failed. Then in [5]
we simply remove our temporary fields.
There are a couple of ips that don't resolve and running that before stats
causes the search to really slow down. (The default bundled version of external_lookup.py
preforms pretty poorly. I started writing a replacement, but I'm not sure where it got to... whoops.)
I actually have it setup that way too. So the local_dns
lookup is technically done twice. I bundled this sequence of commands into a macro which I use a few places and I call macro after a stats
command (which only keeps ip
and not the only partially-known hostname
field) I just found it's easier that way and much more efficient to double up the execution of the file-based lookup rather than putting the dnslookup
before stats
(which results in multiple lookup executions and can lead to a single ip returning more than one host name. doh!)
I'm sure you're aware it's possible to make the first lookup implicit. The rest will likely to have to work something like what you've cooked up