Solved: Re: Given Netflow data with start/end time and tot...

pestatp · ‎03-02-2020

I am looking for an efficient way to calculate the total bandwidth used per second on a device from our netflow data. The netflow data we receive contains a start and end time for the flow(timestamp and endtime respectively) as well as the total bytes that have been transferred. It is simple enough to calculate BPS for each flow, but I cannot figure out how to calculate total bandwidth in a usable manor.

Example netflow data:

{"endtime":"2020-03-02T17:35:31.850000Z","timestamp":"2020-03-02T17:04:51.630000Z","bytes_in":64,"dest_ip":"xxx.xxx.187.28","dest_mask":0,"dest_port":5061,"dest_sysnum":0,"event_name":"netFlowData","exporter_ip":"10.136.57.2","exporter_sampling_interval":1000,"exporter_sampling_mode":1,"exporter_time":"2020-Mar-02 17:35:22","exporter_uptime":1553552496,"flow_end_rel":1553562346,"flow_start_rel":1551722126,"ingress_vlan":103,"input_snmpidx":114,"netflow_version":9,"nexthop_addr":"0.0.0.0","observation_domain_id":0,"output_snmpidx":0,"packets_in":1,"protoid":6,"seqnumber":54418,"src_ip":"10.136.216.199","src_mask":0,"src_port":1028,"src_sysnum":0,"tcp_flags":16,"tos":184}
    {"endtime":"2020-03-02T17:35:31.820000Z","timestamp":"2020-03-02T16:54:11.510000Z","bytes_in":68,"dest_ip":"xxx.xxx.187.28","dest_mask":0,"dest_port":5061,"dest_sysnum":0,"event_name":"netFlowData","exporter_ip":"10.136.57.2","exporter_sampling_interval":1000,"exporter_sampling_mode":1,"exporter_time":"2020-Mar-02 17:35:32","exporter_uptime":1553562496,"flow_end_rel":1553562316,"flow_start_rel":1551082006,"ingress_vlan":54,"input_snmpidx":49,"netflow_version":9,"nexthop_addr":"0.0.0.0","observation_domain_id":0,"output_snmpidx":0,"packets_in":1,"protoid":6,"seqnumber":54509,"src_ip":"10.136.189.15","src_mask":0,"src_port":1028,"src_sysnum":0,"tcp_flags":16,"tos":0}

I have been able to come up with a solution, but it only works with very small timeframes. I would like something that is significantly more robust. The code below will only work with a very limited number of events:

sourcetype=stream:netflow
| dedup src_ip,src_port,dest_ip,dest_port,timestamp,exporter_ip
| eval start_time = strptime(timestamp . "-0000", "%FT%T.%6QZ%z")
| eval end_time = strptime(endtime . "-0000", "%FT%T.%6QZ%z")
| eval diff_secs = end_time-start_time
| eval diff = tostring((diff_secs), "duration") 
| eval bps=if(isnull(bytes_in/diff_secs),0,bytes_in/diff_secs)
| addinfo 
| eval start_time_adj=if(start_time<info_min_time,info_min_time,start_time)
| eval temp=mvrange(start_time_adj,end_time)
| mvexpand temp 
| rename temp AS _time 
| bucket span=1s _time
| timechart sum(bps) as total_bps

pestatp · ‎03-05-2020

The search that works the best for me in this scenario which is modified from to4kawa's answer is:

| makeresults 
| eval _raw="{\"endtime\":\"2020-03-02T17:35:31.850000Z\",\"timestamp\":\"2020-03-02T17:04:51.630000Z\",\"bytes_in\":64,\"dest_ip\":\"xxx.xxx.187.28\",\"dest_mask\":0,\"dest_port\":5061,\"dest_sysnum\":0,\"event_name\":\"netFlowData\",\"exporter_ip\":\"10.136.57.2\",\"exporter_sampling_interval\":1000,\"exporter_sampling_mode\":1,\"exporter_time\":\"2020-Mar-02 17:35:22\",\"exporter_uptime\":1553552496,\"flow_end_rel\":1553562346,\"flow_start_rel\":1551722126,\"ingress_vlan\":103,\"input_snmpidx\":114,\"netflow_version\":9,\"nexthop_addr\":\"0.0.0.0\",\"observation_domain_id\":0,\"output_snmpidx\":0,\"packets_in\":1,\"protoid\":6,\"seqnumber\":54418,\"src_ip\":\"10.136.216.199\",\"src_mask\":0,\"src_port\":1028,\"src_sysnum\":0,\"tcp_flags\":16,\"tos\":184}#
      {\"endtime\":\"2020-03-02T17:35:31.820000Z\",\"timestamp\":\"2020-03-02T16:54:11.510000Z\",\"bytes_in\":68,\"dest_ip\":\"xxx.xxx.187.28\",\"dest_mask\":0,\"dest_port\":5061,\"dest_sysnum\":0,\"event_name\":\"netFlowData\",\"exporter_ip\":\"10.136.57.2\",\"exporter_sampling_interval\":1000,\"exporter_sampling_mode\":1,\"exporter_time\":\"2020-Mar-02 17:35:32\",\"exporter_uptime\":1553562496,\"flow_end_rel\":1553562316,\"flow_start_rel\":1551082006,\"ingress_vlan\":54,\"input_snmpidx\":49,\"netflow_version\":9,\"nexthop_addr\":\"0.0.0.0\",\"observation_domain_id\":0,\"output_snmpidx\":0,\"packets_in\":1,\"protoid\":6,\"seqnumber\":54509,\"src_ip\":\"10.136.189.15\",\"src_mask\":0,\"src_port\":1028,\"src_sysnum\":0,\"tcp_flags\":16,\"tos\":0}" 
| makemv delim="#" _raw 
| stats count by _raw 
| rename COMMENT as "this is sample"
| spath 
| fields - _* count 
| dedup src_ip,src_port,dest_ip,dest_port,exporter_ip,timestamp
| eval start_time = strptime(timestamp . "-0000", "%FT%T.%6QZ%z")
| eval end_time = strptime(endtime . "-0000", "%FT%T.%6QZ%z")
| eval diff_secs = (end_time-start_time)+1
| eval diff = tostring((diff_secs), "duration") 
| eval bps=if(isnull(bytes_in/diff_secs),0,bytes_in/diff_secs)
| addinfo
| eval start_time_adj=if(start_time<info_min_time,info_min_time,start_time)
| eval temp=mvrange(start_time_adj,end_time+1)
| table exporter_ip bps temp
| eval bps=bps
| mvexpand temp
| rename temp AS _time 
| bucket span=1s _time
| timechart cont=f partial=f sum(bps) as total_bps by exporter_ip

A couple of the changes involve the mvrange start time. If you don't use the start time from your selected time range, then your timechart will display blank times way back to when the first start timestamp in your data and in mine, that is always significantly before the time range I want to see. I also split the information by exporter_ip which correlates to the IP of the network device sending the data.

View solution in original post

pestatp · ‎03-05-2020

The search that works the best for me in this scenario which is modified from to4kawa's answer is:

| makeresults 
| eval _raw="{\"endtime\":\"2020-03-02T17:35:31.850000Z\",\"timestamp\":\"2020-03-02T17:04:51.630000Z\",\"bytes_in\":64,\"dest_ip\":\"xxx.xxx.187.28\",\"dest_mask\":0,\"dest_port\":5061,\"dest_sysnum\":0,\"event_name\":\"netFlowData\",\"exporter_ip\":\"10.136.57.2\",\"exporter_sampling_interval\":1000,\"exporter_sampling_mode\":1,\"exporter_time\":\"2020-Mar-02 17:35:22\",\"exporter_uptime\":1553552496,\"flow_end_rel\":1553562346,\"flow_start_rel\":1551722126,\"ingress_vlan\":103,\"input_snmpidx\":114,\"netflow_version\":9,\"nexthop_addr\":\"0.0.0.0\",\"observation_domain_id\":0,\"output_snmpidx\":0,\"packets_in\":1,\"protoid\":6,\"seqnumber\":54418,\"src_ip\":\"10.136.216.199\",\"src_mask\":0,\"src_port\":1028,\"src_sysnum\":0,\"tcp_flags\":16,\"tos\":184}#
      {\"endtime\":\"2020-03-02T17:35:31.820000Z\",\"timestamp\":\"2020-03-02T16:54:11.510000Z\",\"bytes_in\":68,\"dest_ip\":\"xxx.xxx.187.28\",\"dest_mask\":0,\"dest_port\":5061,\"dest_sysnum\":0,\"event_name\":\"netFlowData\",\"exporter_ip\":\"10.136.57.2\",\"exporter_sampling_interval\":1000,\"exporter_sampling_mode\":1,\"exporter_time\":\"2020-Mar-02 17:35:32\",\"exporter_uptime\":1553562496,\"flow_end_rel\":1553562316,\"flow_start_rel\":1551082006,\"ingress_vlan\":54,\"input_snmpidx\":49,\"netflow_version\":9,\"nexthop_addr\":\"0.0.0.0\",\"observation_domain_id\":0,\"output_snmpidx\":0,\"packets_in\":1,\"protoid\":6,\"seqnumber\":54509,\"src_ip\":\"10.136.189.15\",\"src_mask\":0,\"src_port\":1028,\"src_sysnum\":0,\"tcp_flags\":16,\"tos\":0}" 
| makemv delim="#" _raw 
| stats count by _raw 
| rename COMMENT as "this is sample"
| spath 
| fields - _* count 
| dedup src_ip,src_port,dest_ip,dest_port,exporter_ip,timestamp
| eval start_time = strptime(timestamp . "-0000", "%FT%T.%6QZ%z")
| eval end_time = strptime(endtime . "-0000", "%FT%T.%6QZ%z")
| eval diff_secs = (end_time-start_time)+1
| eval diff = tostring((diff_secs), "duration") 
| eval bps=if(isnull(bytes_in/diff_secs),0,bytes_in/diff_secs)
| addinfo
| eval start_time_adj=if(start_time<info_min_time,info_min_time,start_time)
| eval temp=mvrange(start_time_adj,end_time+1)
| table exporter_ip bps temp
| eval bps=bps
| mvexpand temp
| rename temp AS _time 
| bucket span=1s _time
| timechart cont=f partial=f sum(bps) as total_bps by exporter_ip

A couple of the changes involve the mvrange start time. If you don't use the start time from your selected time range, then your timechart will display blank times way back to when the first start timestamp in your data and in mine, that is always significantly before the time range I want to see. I also split the information by exporter_ip which correlates to the IP of the network device sending the data.

to4kawa · ‎03-03-2020

| makeresults 
| eval _raw="{\"endtime\":\"2020-03-02T17:35:31.850000Z\",\"timestamp\":\"2020-03-02T17:04:51.630000Z\",\"bytes_in\":64,\"dest_ip\":\"xxx.xxx.187.28\",\"dest_mask\":0,\"dest_port\":5061,\"dest_sysnum\":0,\"event_name\":\"netFlowData\",\"exporter_ip\":\"10.136.57.2\",\"exporter_sampling_interval\":1000,\"exporter_sampling_mode\":1,\"exporter_time\":\"2020-Mar-02 17:35:22\",\"exporter_uptime\":1553552496,\"flow_end_rel\":1553562346,\"flow_start_rel\":1551722126,\"ingress_vlan\":103,\"input_snmpidx\":114,\"netflow_version\":9,\"nexthop_addr\":\"0.0.0.0\",\"observation_domain_id\":0,\"output_snmpidx\":0,\"packets_in\":1,\"protoid\":6,\"seqnumber\":54418,\"src_ip\":\"10.136.216.199\",\"src_mask\":0,\"src_port\":1028,\"src_sysnum\":0,\"tcp_flags\":16,\"tos\":184}#
     {\"endtime\":\"2020-03-02T17:35:31.820000Z\",\"timestamp\":\"2020-03-02T16:54:11.510000Z\",\"bytes_in\":68,\"dest_ip\":\"xxx.xxx.187.28\",\"dest_mask\":0,\"dest_port\":5061,\"dest_sysnum\":0,\"event_name\":\"netFlowData\",\"exporter_ip\":\"10.136.57.2\",\"exporter_sampling_interval\":1000,\"exporter_sampling_mode\":1,\"exporter_time\":\"2020-Mar-02 17:35:32\",\"exporter_uptime\":1553562496,\"flow_end_rel\":1553562316,\"flow_start_rel\":1551082006,\"ingress_vlan\":54,\"input_snmpidx\":49,\"netflow_version\":9,\"nexthop_addr\":\"0.0.0.0\",\"observation_domain_id\":0,\"output_snmpidx\":0,\"packets_in\":1,\"protoid\":6,\"seqnumber\":54509,\"src_ip\":\"10.136.189.15\",\"src_mask\":0,\"src_port\":1028,\"src_sysnum\":0,\"tcp_flags\":16,\"tos\":0}" 
| makemv delim="#" _raw 
| stats count by _raw 
| rename COMMENT as "this is sample"
| spath 
| fields - _* count 
| eval start_time = strptime(timestamp . "-0000", "%FT%T.%6QZ%z") 
| eval end_time = strptime(endtime . "-0000", "%FT%T.%6QZ%z") 
| eval diff_secs = end_time-start_time 
| eval diff = tostring((diff_secs), "duration") 
| eval bps=if(isnull(bytes_in/diff_secs),0,bytes_in/diff_secs) 
| eval temp=mvrange(start_time,end_time) 
| stats values(bps) as bps by temp 
| rename temp AS _time 
| bucket span=1s _time 
| timechart partial=f span=10m sum(bps) as total_bps

Hi, @pestatp
your query is good. mvexpand can't work with huge multi values.
so, try stats by

pestatp · ‎03-03-2020

The reason I said it only works with a limited number of events is because mvexpand generates a seriously large number of results. Using this on my data this morning for 15 minutes, it creates 12 million results from ~32,000 actual Netflow events. If I attempt to view anything longer than 15 minutes or so, then my results get truncated due to memory. I already upped the mvexpand memory to 2,048.

I was really hoping to find a less "expensive" way to accomplish this.

to4kawa · ‎03-03-2020

@pestatp
I see, mvexpand has a limit. you can modify limits.conf.
or use stats by
my answer is updated. please confirm.

pestatp · ‎03-03-2020

Using stats by prevented it from truncating results and it seems to be a bit faster, but still fairly slow.

This query took 48 seconds for 15 minutes worth of data:

sourcetype=stream:netflow
| fields - _* count 
| dedup src_ip,src_port,dest_ip,dest_port,exporter_ip,timestamp
| eval start_time = strptime(timestamp . "-0000", "%FT%T.%6QZ%z")
| eval end_time = strptime(endtime . "-0000", "%FT%T.%6QZ%z")
| eval diff_secs = (end_time-start_time)+1
| eval diff = tostring((diff_secs), "duration") 
| eval bps=if(isnull(bytes_in/diff_secs),0,bytes_in/diff_secs)
| addinfo
| eval start_time_adj=if(start_time<info_min_time,info_min_time,start_time)
| eval temp=mvrange(start_time_adj,end_time+1)
| table exporter_ip bps temp
| stats values(bps) as bps by temp, exporter_ip
| rename temp AS _time 
| bucket span=1s _time
| timechart cont=f partial=f sum(bps) as total_bps by exporter_ip

It's too bad that there isn't something like concurrency that can be used to count a field instead of just the number of events.

to4kawa · ‎03-03-2020

this query makes huge one second bps logs. so it's very slow.
if you want only one period, other query is useful.

pestatp · ‎03-05-2020

My solution was to modify your answer a little bit, but the biggest thing I did was upgrade our Splunk indexer. The original server was getting quite old and too slow for a query like this. Just the upgrade decreased the job time from 48 seconds to less than 2 for the same query.

to4kawa · ‎03-05-2020

great! It changes that much, I'm surprised.

Given Netflow data with start/end time and total bits transfered, calculate total bandwith used per time second

Splunk Observability for AI

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

Splunk Observability as Code: From Zero to Dashboard

Are you a member of the Splunk Community?

Given Netflow data with start/end time and total bits transfered, calculate total bandwith used per time second

Splunk Observability for AI

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

Splunk Observability as Code: From Zero to Dashboard