I would like to keep track of the dns queries that are made in our environment. I defined a kv store and a lookup as follows:
collection = passive_dns
external_type = kvstore
fields_list = _key,domain,count,client_count,first,last
field.count = number
field.client_count = number
field.domain = string
field.first = time
field.last = time
replicate = false
This is the query I use to populate the fields
index=dns sourcetype=clientdns
| stats count dc(src) as client_count earliest(_time) as first latest(_time) as last by query
| rename query as domain
| append maxout=0 [inputlookup passive_dns ]
| stats max(client_count) as client_count sum(count) as count min(first) as first max(last) as last values(_key) as _key by domain
| outputlookup passive_dns
The idea is to keep track how many times a domain was resolved and when it was first/last resolved (I will add query types & ips later on)
If I run this query every 15 minutes it will start taking more than 15 minutes to run after a couple of executions because the kv store is growing. (We have about 4 mio DNS resolutions per hour)
This is obviously not the right way to do it. Is there a way I can either not rewrite the entire lookup on every execution and only update the required entries or/and increase the speed of the kv store updates?
I was able to speed up the search by filtering out entries in the lookup that did not change:
index=dns sourcetype=clientdns
| stats count dc(src) as client_count earliest(_time) as first latest(_time) as last by query
| rename query as domain
|eval hash_before=md5(""+domain+first+last+count+client_count)
| append maxout=0 [inputlookup passive_dns ]
| stats max(client_count) as client_count sum(count) as count min(first) as first max(last) as last values(_key) as _key values(hash_before) as hash_before by domain
| eval hash_after=md5(""+domain+first+last+count+client_count)
| eval entry_status=case(isnull(hash_before), "new", hash_before!=hash_after,"changed",hash_before==hash_after,"same",true(),"strange")
| where entry_status!="same"
| fields domain client_count count first last
| outputlookup append=True passive_dns
The query time is less than 60s at the moment. The lookup will grow quite a bit. I will find out how quickly the performance will degrade (and if the lookup is still usable once it contains millions of entries).
Update 2
I noticed that | append maxout=0 hits maxresultrows in limits.conf [searchresults]. I should probably not increase that limit to much.
... View more