Knowledge Management

Building a passive dns lookup in Splunk (Large KV Store with frequent updates)



I would like to keep track of the dns queries that are made in our environment. I defined a kv store and a lookup as follows:

collection = passive_dns
external_type = kvstore
fields_list = _key,domain,count,client_count,first,last

field.count = number
field.client_count = number
field.domain = string
field.first = time
field.last = time
replicate = false

This is the query I use to populate the fields

index=dns sourcetype=clientdns   
| stats count dc(src) as client_count earliest(_time) as first latest(_time) as last by query
| rename query as domain 
| append maxout=0 [inputlookup passive_dns ] 
| stats max(client_count) as client_count sum(count) as count min(first) as first max(last) as last values(_key) as _key by domain 
| outputlookup passive_dns

The idea is to keep track how many times a domain was resolved and when it was first/last resolved (I will add query types & ips later on)

If I run this query every 15 minutes it will start taking more than 15 minutes to run after a couple of executions because the kv store is growing. (We have about 4 mio DNS resolutions per hour)

This is obviously not the right way to do it. Is there a way I can either not rewrite the entire lookup on every execution and only update the required entries or/and increase the speed of the kv store updates?



I was able to speed up the search by filtering out entries in the lookup that did not change:

index=dns sourcetype=clientdns
| stats count dc(src) as client_count earliest(_time) as first latest(_time) as last by query
| rename query as domain 
|eval hash_before=md5(""+domain+first+last+count+client_count) 
| append maxout=0 [inputlookup passive_dns ] 
| stats max(client_count) as client_count sum(count) as count min(first) as first max(last) as last values(_key) as _key values(hash_before) as hash_before by domain 
| eval hash_after=md5(""+domain+first+last+count+client_count) 
| eval entry_status=case(isnull(hash_before), "new", hash_before!=hash_after,"changed",hash_before==hash_after,"same",true(),"strange") 
| where entry_status!="same" 
| fields domain client_count count first last 
| outputlookup append=True passive_dns

The query time is less than 60s at the moment. The lookup will grow quite a bit. I will find out how quickly the performance will degrade (and if the lookup is still usable once it contains millions of entries).

Update 2

I noticed that | append maxout=0 hits maxresultrows in limits.conf [searchresults]. I should probably not increase that limit to much.

Tags (1)
0 Karma


You should use a summary index ( instead of an outputlookup.
You could summarize stats of DNS resolving every 15 mins, and write this to the summary index. And than query the summary index to generate stats over a longer period.


The lookup contains 7 million entries now. Querying it takes about 5-10 seconds which is ok. But updating it takes up to 45 minutes now. I'll try some other way of doing this.

0 Karma
Get Updates on the Splunk Community!

Message Parsing in SOCK

Introduction This blog post is part of an ongoing series on SOCK enablement. In this blog post, I will write ...

Exploring the OpenTelemetry Collector’s Kubernetes annotation-based discovery

We’ve already explored a few topics around observability in a Kubernetes environment -- Common Failures in a ...

Use ‘em or lose ‘em | Splunk training units do expire

Whether it’s hummus, a ham sandwich, or a human, almost everything in this world has an expiration date. And, ...