I would like to keep track of the dns queries that are made in our environment. I defined a kv store and a lookup as follows:
collection = passive_dns
external_type = kvstore
fields_list = _key,domain,count,client_count,first,last
field.count = number
field.client_count = number
field.domain = string
field.first = time
field.last = time
replicate = false
This is the query I use to populate the fields
index=dns sourcetype=clientdns | stats count dc(src) as client_count earliest(_time) as first latest(_time) as last by query | rename query as domain | append maxout=0 [inputlookup passive_dns ] | stats max(client_count) as client_count sum(count) as count min(first) as first max(last) as last values(_key) as _key by domain | outputlookup passive_dns
The idea is to keep track how many times a domain was resolved and when it was first/last resolved (I will add query types & ips later on)
If I run this query every 15 minutes it will start taking more than 15 minutes to run after a couple of executions because the kv store is growing. (We have about 4 mio DNS resolutions per hour)
This is obviously not the right way to do it. Is there a way I can either not rewrite the entire lookup on every execution and only update the required entries or/and increase the speed of the kv store updates?
I was able to speed up the search by filtering out entries in the lookup that did not change:
index=dns sourcetype=clientdns | stats count dc(src) as client_count earliest(_time) as first latest(_time) as last by query | rename query as domain |eval hash_before=md5(""+domain+first+last+count+client_count) | append maxout=0 [inputlookup passive_dns ] | stats max(client_count) as client_count sum(count) as count min(first) as first max(last) as last values(_key) as _key values(hash_before) as hash_before by domain | eval hash_after=md5(""+domain+first+last+count+client_count) | eval entry_status=case(isnull(hash_before), "new", hash_before!=hash_after,"changed",hash_before==hash_after,"same",true(),"strange") | where entry_status!="same" | fields domain client_count count first last | outputlookup append=True passive_dns
The query time is less than 60s at the moment. The lookup will grow quite a bit. I will find out how quickly the performance will degrade (and if the lookup is still usable once it contains millions of entries).
I noticed that | append maxout=0 hits maxresultrows in limits.conf [searchresults]. I should probably not increase that limit to much.
You should use a summary index (http://docs.splunk.com/Documentation/Splunk/7.1.0/Knowledge/Usesummaryindexing) instead of an outputlookup.
You could summarize stats of DNS resolving every 15 mins, and write this to the summary index. And than query the summary index to generate stats over a longer period.
The lookup contains 7 million entries now. Querying it takes about 5-10 seconds which is ok. But updating it takes up to 45 minutes now. I'll try some other way of doing this.