Solved: How to create transaction where single value match...

asleeis · ‎12-29-2016

Hi,

I'm working with some DNS query logs (actually timestamped tcpdump output) and trying to match them to firewall logs. In the firewall log, I have the destination IP. With the DNS query logs, I have a hostname (request) matching to one or more response values (not always an IP but usually). I'm trying to join the firewall entry with the corresponding DNS query. But I'm unsure how (if possible) to put them together in a transaction.

source="/var/log/dnsqueries.log"  

| rex field=_raw "IP (?<src_ip>\d+\.\d+\.\d+\.\d+).(?<src_port>\d+) +\> +(?<dst_ip>\d+\.\d+\.\d+\.\d+).(?<dst_port>\d+): +(?<reqid>\d+)[^\d] *(\[[^\]]+\] )?((?<q_message>(?<q_rectype>[A-Z]+)\? (?<q_value>[^ ]+))|\d+/\d+/\d+ +(?<r_message>(?<r_rectype>[^ ]+) +(?<r_value>.+)) \(\d+\))?" 

| eval req_ip=if(src_port=53, dst_ip, src_ip) 
| eval req_port=if(src_port=53, dst_port, src_port) 
| eval svr_ip=if(src_port=53, src_ip, dst_ip)  

| rex field=r_message max_match=25 "(?<r_t>[^ ]+) +(?<r_v>[^ ,]+)(, )?"
| eval r_tv=mvzip(r_t,r_v,"#")  

| append [
    search (source="/var/log/firewall.log" SRC=123.123.123.123)
]

| eval extip=if(isnotnull(DST), DST, r_v) 
| eval reqid=if(isnull(reqid), SPT, reqid) 
| eval req_port=if(isnull(req_port), SPT, req_port) 

| transaction reqid req_port maxspan=2s 
| transaction extip maxspan=5s

I am getting the first transaction matching without issue (pairing request and response of DNS queries). But I can't figure out how to match the multi-value list I created. I want it to join if the single value of the FW entry matches any of the MV list of the DNS response.

Here's an example of data this is trying to join:

Dec 29 12:56:37 ec2 ec2-dns-requests: 12:56:37.990311 IP 123.123.123.123.53332 > 123.123.123.2.53: 54671+ A? 5-8-5-app.agent.datadoghq.com. (47)

Dec 29 12:56:37 ec2 ec2-dns-requests: 12:56:37.991417 IP 123.123.123.2.53 > 123.123.123.123.53332: 54671 9/0/0 CNAME dualstack.agent-520-209329848.us-east-1.elb.amazonaws.com., A 54.225.245.134, A 54.243.126.149, A 54.243.194.58, A 54.225.209.18, A 54.225.213.216, A 54.225.214.228, A 54.225.216.202, A 54.225.223.2 (243)

Dec 29 12:56:38 ec2 kernel: iptables-connections: IN= OUT=eth0 SRC=123.123.123.123 DST=54.225.245.134 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=37699 DF PROTO=TCP SPT=38976 DPT=443 WINDOW=26883 RES=0x00 SYN URGP=0

Essentially, I'm trying to match the connection to the DNS request so I can readily see what the query was for outbound connections. And I'm trying to do this with the log data I have made available to me. In this example, I know a connection was made to 54.225.245.134, and I'd like to be able to have the information that its related to the request for "5-8-5-app.agent.datadoghq.com" tied to it. That way, when dealing with a bulk of data, I can include the requested hostname next to various connections. It's like a lookup, except it's driven by the capture of DNS queries in near-realtime.

Any insight or suggestions as to how to join/match this data would be greatly appreciated.

Thanks,
-Alex

asleeis · ‎12-30-2016

Okay... thanks to @richgalloway for some suggestions about the mv* eval functions. While not directly applicable, it got me looking down the path of manipulating the MV fields as part of my solution.

The short of it... I first parse my fields as I was trying to do. Then use the mvexpand command. I was trying to use it, but I think I was doing so incorrectly. The idea being create the permutations of events for each value in the MV field I wanted (duplicating the other data in the event). Then matching the one event (created out of the mvexpand) to the traffic log value... then massage into output...

This is what that ended up looking like...
(source="/var/log/ec2-dnsqueries.log" "ec2-dns-requests" NOT("in-addr.arpa."))

| rex field=_raw "IP (?<src_ip>\d+\.\d+\.\d+\.\d+).(?<src_port>\d+) +\> +(?<dst_ip>\d+\.\d+\.\d+\.\d+).(?<dst_port>\d+): +(?<reqid>\d+)[^\d] *(\[[^\]]+\] )?((?<q_message>(?<q_rectype>[A-Z]+)\? (?<q_value>[^ ]+))|\d+/\d+/\d+ +(?<r_message>[^ ]+ +.+) \(\d+\))?" 

| eval req_ip=if(src_port=53, dst_ip, src_ip) 
| eval req_port=if(src_port=53, dst_port, src_port) 
| eval svr_ip=if(src_port=53, src_ip, dst_ip)  

| transaction reqid req_port maxspan=1s 

| rex field=r_message max_match=25 "(?<r_tv>[^ ]+ +[^ ,]+)(, )?"
| mvexpand r_tv | rex field=r_tv max_match=25 "(?<r_t>.+) (?<r_v>.+)"

| append [
    search (source="/var/log/firewall.log" SRC=123.123.123.123)
] 
| sort +_time

| eval extip=if(isnotnull(DST), DST, r_v) 

| transaction extip maxspan=1s 

| search DST=*
| table _time, req_ip, DST, q_value, r_t, r_v

The additional search near the end is to ignore all the DNS records that weren't matched to the firewall connection record. I'm sure I could refine this to be a bit more efficient, but this results in what I was looking for...

_time   req_ip  DST q_value r_t r_v
2016-12-29 12:56:37 123.123.123.123 54.225.245.134  5-8-5-app.agent.datadoghq.com.  A   54.225.245.134
2016-12-29 12:56:07 123.123.123.123 54.225.154.127  5-8-5-app.agent.datadoghq.com.  A   54.225.154.127

A nice neat output of the connections, tied back to the nearest DNS query that directed that system to that IP (so I know the hostname queried).

Hopefully, if anyone else is doing similarly, this will help. One of my first mistakes was that I needed to move the first transaction before I was messing with the mvexpand. The other way around wasn't resulting in an expansion with the request portion of the DNS query and all sorts of unusefulness.

Cheers,
-Alex

View solution in original post

asleeis · ‎12-30-2016

Okay... thanks to @richgalloway for some suggestions about the mv* eval functions. While not directly applicable, it got me looking down the path of manipulating the MV fields as part of my solution.

The short of it... I first parse my fields as I was trying to do. Then use the mvexpand command. I was trying to use it, but I think I was doing so incorrectly. The idea being create the permutations of events for each value in the MV field I wanted (duplicating the other data in the event). Then matching the one event (created out of the mvexpand) to the traffic log value... then massage into output...

This is what that ended up looking like...
(source="/var/log/ec2-dnsqueries.log" "ec2-dns-requests" NOT("in-addr.arpa."))

| rex field=_raw "IP (?<src_ip>\d+\.\d+\.\d+\.\d+).(?<src_port>\d+) +\> +(?<dst_ip>\d+\.\d+\.\d+\.\d+).(?<dst_port>\d+): +(?<reqid>\d+)[^\d] *(\[[^\]]+\] )?((?<q_message>(?<q_rectype>[A-Z]+)\? (?<q_value>[^ ]+))|\d+/\d+/\d+ +(?<r_message>[^ ]+ +.+) \(\d+\))?" 

| eval req_ip=if(src_port=53, dst_ip, src_ip) 
| eval req_port=if(src_port=53, dst_port, src_port) 
| eval svr_ip=if(src_port=53, src_ip, dst_ip)  

| transaction reqid req_port maxspan=1s 

| rex field=r_message max_match=25 "(?<r_tv>[^ ]+ +[^ ,]+)(, )?"
| mvexpand r_tv | rex field=r_tv max_match=25 "(?<r_t>.+) (?<r_v>.+)"

| append [
    search (source="/var/log/firewall.log" SRC=123.123.123.123)
] 
| sort +_time

| eval extip=if(isnotnull(DST), DST, r_v) 

| transaction extip maxspan=1s 

| search DST=*
| table _time, req_ip, DST, q_value, r_t, r_v

The additional search near the end is to ignore all the DNS records that weren't matched to the firewall connection record. I'm sure I could refine this to be a bit more efficient, but this results in what I was looking for...

_time   req_ip  DST q_value r_t r_v
2016-12-29 12:56:37 123.123.123.123 54.225.245.134  5-8-5-app.agent.datadoghq.com.  A   54.225.245.134
2016-12-29 12:56:07 123.123.123.123 54.225.154.127  5-8-5-app.agent.datadoghq.com.  A   54.225.154.127

A nice neat output of the connections, tied back to the nearest DNS query that directed that system to that IP (so I know the hostname queried).

Hopefully, if anyone else is doing similarly, this will help. One of my first mistakes was that I needed to move the first transaction before I was messing with the mvexpand. The other way around wasn't resulting in an expansion with the request portion of the DNS query and all sorts of unusefulness.

Cheers,
-Alex

asleeis · ‎12-30-2016

Hmmm... well, some things still aren't quite right. Strange results where if I expand the time range, things don't match that did match with short periods. But still... this seems to be correct for addressing my original issue I posted about. But my larger problem/solution journeys on. 🙂

richgalloway · ‎12-29-2016

Have you looked at the mvfilter and mvfind functions? The latter looks promising except it calls for a regex string.

---
If this reply helps you, Karma would be appreciated.

asleeis · ‎12-29-2016

That could be used in an eval of the record, but how would I use the results of a variable from one record (keep in mind my example is a single case, and I'd be wanting to do this in mass) to match it against this? And especially ensuring time proximity. This is where transaction is handy... but as far as I can tell, I can only do exact match on single value fields (like session/txn IDs and such).

richgalloway · ‎12-29-2016

eval functions can also be used with the where command.

---
If this reply helps you, Karma would be appreciated.

asleeis · ‎12-29-2016

Sure, but again, how does that help with binding two different records together? I'm not understanding how the use of the functions addresses my need/situation. They're useful functions for working with MV fields, but how are they useful to my scenario?

richgalloway · ‎12-29-2016

Yeah, I was probably a bit hasty with my suggestion. Sorry about that.

---
If this reply helps you, Karma would be appreciated.

asleeis · ‎12-29-2016

No worries. I appreciate the response. I just wasn't sure if I was totally misunderstanding those functions and how I could use them. I've been using Splunk for many years, but mostly at a moderate level, so I wasn't discounting some possible ignorance on my part. 🙂

I suspect I will need to think through a whole other way to try to approach the problem. I just don't know what that would be. 😕

How to create transaction where single value matches multi-value field?

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!