For the benefit of answering this question, I will point out that between DalJeanis and I, we came up with the following query:
index=pan_logs earliest=-1s log_subtype=url
| rex field=dest_hostname "(?P<ld2>[\w_-]+\.[\w_-]+)$"
| rex field=dest_hostname "(?P<ld3>[\w_-]+\.[\w_-]+\.[\w_-]+)$"
| rex field=dest_hostname "(?P<ld4>[\w_-]+\.[\w_-]+\.[\w_-]+\.[\w_-]+)$"
| `alexa_lookup(ld2)`
| `alexa_lookup(ld3)`
| `alexa_lookup(ld4)`
| eval rank=coalesce(ld4_rank, ld3_rank, ld2_rank, "unknown")
| eval ld=case(ld4_rank>0,"4",ld3_rank>0,"3",ld2_rank>0,"2",1=1,"NA")
| table dest_hostname rank ld
DalJeanis also noted that the inverselookup (not present in the Alexa top 1M) can be done as follows:
index=pan_logs earliest=-1s log_subtype=url
| rex field=dest_hostname "(?P<ld2>[\w_-]+\.[\w_-]+)$"
| `alexa_lookup(ld2)`
| where isnull(ld2_rank) OR ld2_rank>1000000
| rex field=dest_hostname "(?P<ld3>[\w_-]+\.[\w_-]+\.[\w_-]+)$"
| `alexa_lookup(ld3)`
| where isnull(ld3_rank) OR coalesce(ld3_rank,ld2_rank)>1000000
| rex field=dest_hostname "(?P<ld4>[\w_-]+\.[\w_-]+\.[\w_-]+\.[\w_-]+)$"
| `alexa_lookup(ld4)`
| where isnull(ld4_rank) OR coalesce(ld4_rank,ld3_rank,ld2_rank)>1000000
| eval rank=coalesce(ld4_rank, ld3_rank, ld2_rank, "unknown")
| eval ld=case(ld4_rank>0,"4",ld3_rank>0,"3",ld2_rank>0,"2",1=1,"NA")
| table dest_hostname rank ld
... View more