Hi Splunkers
Thank you so much for implementing the splunk stream fields "query" and "answer"
This already help a lot
I'm still wondering how you solved this, assuming it's in Splunk logic and not in the streamfwd
not that I will turn of some required field that is needed to compute it.
But there is some room for fine tuning
The Current implementation handles
What it seems to not yet being able to handle are
Queries with multiple responses like google.com
;; QUESTION SECTION:
;google.com. IN A
;; ANSWER SECTION:
google.com. 24 IN A 173.194.40.35
google.com. 24 IN A 173.194.40.36
google.com. 24 IN A 173.194.40.32
google.com. 24 IN A 173.194.40.39
google.com. 24 IN A 173.194.40.40
google.com. 24 IN A 173.194.40.33
google.com. 24 IN A 173.194.40.41
google.com. 24 IN A 173.194.40.46
google.com. 24 IN A 173.194.40.38
google.com. 24 IN A 173.194.40.37
google.com. 24 IN A 173.194.40.34
also it seems (we are still verifying this) queries of authoritative name server like
;; QUESTION SECTION:
;ns4.google.com. IN A
;; ANSWER SECTION:
ns4.google.com. 345600 IN A 216.239.38.10
;; AUTHORITY SECTION:
google.com. 80463 IN NS ns1.google.com.
google.com. 80463 IN NS ns4.google.com.
google.com. 80463 IN NS ns3.google.com.
google.com. 80463 IN NS ns2.google.com.
;; ADDITIONAL SECTION:
ns1.google.com. 80463 IN A 216.239.32.10
ns2.google.com. 80463 IN A 216.239.34.10
ns3.google.com. 80463 IN A 216.239.36.10
Thank you
Hi Mathias,
Which flow fields do we really need to have the "query" and "answer" field
Not sure what you mean by flow fields? The only SPL magic that you may need to do with regards to "stream:dns" events is to normalize the name, host_addr, host_type (and maybe other) fields since they may be single value as well as MV field. This is a known pain with Stream and we're working on addressing it. What do you see if you run a SPL query such as this:
sourcetype="stream:dns" dest_port=53 query_type="A" | table query, reply_code, name{}, host_addr{}, host_type{}
sourcetype="stream:dns" dest_port=53 query_type="A" | table query, reply_code, name, host_addr, host_type
Re: estimating the data volume needs. You can turn the dns stream (or streams) into "stats-only" mode to make streamfwd do pretty much everything it would do when that stream (dns, or any other one for that matter) is enabled - capture and process the data and build an event in memory - except actually sending the data to Splunk indexer. Instead, it sends the aggregated statistics about the size of the events you can see on the App for Stream's "Stream Stats" dashboard. So it's completely free and doesn't affect your license data volume at all. You'll need the setup the _internal index forwarding on your UF layer running Splunk_TA_stream though as the stats events are sent to _internal.
@Re
Sounds good. Regarding _internal index. If I remember correctly this is documented in the steams setup documentation something like whitelist.2.index(_internal, ...) this should do it right?
@"query" and "answer"
Splunk Strams App allows to configure the data extracted by the streamfwd. This consists of multiple tiers of extraction.
flow. declare layer 2-4 information, so flow information without "DPI", extracted by the streamfwd
dns. declare DNS data extracted by the streamfwd
In the dns.xyz variables there is nothing that directly maps to the Splunk Search field "answer".
Also in the forwarded JSON data there is no key/value pair "answer" : "1.2.3.4". So at the raw Event layer "answer" does not exists yet. And since extracting the correct "answer"
So somewhere the answer field is extracted from the event/field. Doing this properly needs a bit of processing of several fields and is not that easy (I'm still working on it).
The question is which Elements in the JSON are required to extract the "answer" field.
Do I interpret this correctly: currently the answer field contains the value of host_addr if there is only one IP in the field.
Hi
As I mentioned in a previous post
http://answers.splunk.com/answers/291111/splunk-app-for-stream-when-analyzing-dns-data-is-t.html
we try to build passiv DNS from Wire Data, alternatively we could use Proxy logs, but many of our customers don't have a proxy.
pDNS is a great Tool to track down malware
What is the minimum needed
Additional Information that is used/should be used to prevent malicious behaviour or generally used in pDNS
Additional Information to track down internal problem
Generally Splunk Streams is nearly there 😄
I hope this helps
Hi Mathias,
Thanks for the clarification. I agree that Stream implements most of the requirements you outlined, the only missing part is that the multi-value data is "jammed"/poorly structured (and we're working on it..)
thanks again,
Vladimir
Sounds good
We are now using the current version to get a feeling what is possible etc. , i.e. we are running a pilot
What I would like to know is
Which flow fields do we really need to have the "query" and "answer" field
Im pretty sure these fields you don't (currently) get from the streamfwd and they are not selectable in the Streams config Dashboard. I assume you use some splunk magic to extract it.
We need to know what is the minimum data footprint, that at we can estimate the needs, or at least have some facts, when the pilot outcomes will be discussed .. we don't know is usually a bad argument and it gives the critics a free pass "too much data .... too expensive ... dont like the color ..."
We are aware that working on wire data is everything but trivial (yay MPLS), as we also build some tools. So I have the utmost respect for this constantly improving Tool.
For our other use case we build a HTTP dissector. We cannot use Splunk Streams since we only tap outgoing traffic, and Splunk streams requires bi-directional traffic. We understand why you do, and we tested it, but with our current architecture it is not viable.
PS: our prototype is VERY basic, and we are aware that we only have a partial/limited view, but it already helped us in one case to confirm our suspicions.
hi mathiask,
Great to hear you find Stream useful, and thanks for your feedback! Could you please provide more details on what you're trying to accomplish so that we can try* devising a workaround?
thanks
vladimir
*backstory: Stream currently has a somewhat limited event model (which for the most part is basically a flat list of key:value pairs). I am guessing this may be the root cause of the issues you're experiencing since the current model works reasonably well for many protocols, but has problems representing more complex structures like the DNS responses you've mentioned. I can tell as much as "we're aware of this limitation and working on addressing it"