I have a scenario where I would like to do a two-layered lookup. I'm essentially doing an IP address lookup against a table (.csv file) for a number of well known hosts, and then I want to do a full DNS lookup for any IP that fails to match against my first lookup.
There are a couple of reasons that push me to this 2-layered lookup approach:
So basically I'm looking for the logic: Use the first lookup, and if it fails for a given event, then (and only then) use the secondary lookup.
What's the best way to accomplish this?
Here is a the solution I came up with. (I welcome any feedback or alternate ideas.)
Here is my search:
sourcetype=my_sourcetype ip=*
[1] | lookup local_dns ip OUTPUT hostname
[2] | eval clientip=if(isnull(hostname), ip, null())
[3] | lookup dnslookup clientip OUTPUT clienthost
[4] | eval hostname=coalesce(hostname, clienthost, ip)
[5] | fields - clientip clienthost
Note: This search is broken in separate lines and numbered them for readability
Explication:
The goal of these search commands is to start with an ip field and "lookup" the best value for hostname using two different lookups.
Line [1] does the first lookup against a local .csv lookup file. If the lookup doesn't match then the hostname field remains missing for that event. So in [2], we assign a value to clientip only if the first lookup failed (e.g. when hostname is null). The second lookup [3] should only be preformed for events that have a value for clientip (Can anyone confirm this point?), this is the more expensive external lookup script that makes a bunch of socket.gethostbyaddr calls. So finally in [4] we can consolidate our values. We keep either hostname from the first lookup, or clienthost from the second lookup, or ip if both of the lookups failed. Then in [5] we simply remove our temporary fields.
Here is a the solution I came up with. (I welcome any feedback or alternate ideas.)
Here is my search:
sourcetype=my_sourcetype ip=*
[1] | lookup local_dns ip OUTPUT hostname
[2] | eval clientip=if(isnull(hostname), ip, null())
[3] | lookup dnslookup clientip OUTPUT clienthost
[4] | eval hostname=coalesce(hostname, clienthost, ip)
[5] | fields - clientip clienthost
Note: This search is broken in separate lines and numbered them for readability
Explication:
The goal of these search commands is to start with an ip field and "lookup" the best value for hostname using two different lookups.
Line [1] does the first lookup against a local .csv lookup file. If the lookup doesn't match then the hostname field remains missing for that event. So in [2], we assign a value to clientip only if the first lookup failed (e.g. when hostname is null). The second lookup [3] should only be preformed for events that have a value for clientip (Can anyone confirm this point?), this is the more expensive external lookup script that makes a bunch of socket.gethostbyaddr calls. So finally in [4] we can consolidate our values. We keep either hostname from the first lookup, or clienthost from the second lookup, or ip if both of the lookups failed. Then in [5] we simply remove our temporary fields.
There are a couple of ips that don't resolve and running that before stats causes the search to really slow down. (The default bundled version of external_lookup.py preforms pretty poorly. I started writing a replacement, but I'm not sure where it got to... whoops.)
I actually have it setup that way too. So the local_dns lookup is technically done twice. I bundled this sequence of commands into a macro which I use a few places and I call macro after a stats command (which only keeps ip and not the only partially-known hostname field) I just found it's easier that way and much more efficient to double up the execution of the file-based lookup rather than putting the dnslookup before stats (which results in multiple lookup executions and can lead to a single ip returning more than one host name. doh!)
I'm sure you're aware it's possible to make the first lookup implicit. The rest will likely to have to work something like what you've cooked up