Getting Data In

Given Netflow data with start/end time and total bits transfered, calculate total bandwith used per time second

pestatp
Path Finder

I am looking for an efficient way to calculate the total bandwidth used per second on a device from our netflow data. The netflow data we receive contains a start and end time for the flow(timestamp and endtime respectively) as well as the total bytes that have been transferred. It is simple enough to calculate BPS for each flow, but I cannot figure out how to calculate total bandwidth in a usable manor.

Example netflow data:

{"endtime":"2020-03-02T17:35:31.850000Z","timestamp":"2020-03-02T17:04:51.630000Z","bytes_in":64,"dest_ip":"xxx.xxx.187.28","dest_mask":0,"dest_port":5061,"dest_sysnum":0,"event_name":"netFlowData","exporter_ip":"10.136.57.2","exporter_sampling_interval":1000,"exporter_sampling_mode":1,"exporter_time":"2020-Mar-02 17:35:22","exporter_uptime":1553552496,"flow_end_rel":1553562346,"flow_start_rel":1551722126,"ingress_vlan":103,"input_snmpidx":114,"netflow_version":9,"nexthop_addr":"0.0.0.0","observation_domain_id":0,"output_snmpidx":0,"packets_in":1,"protoid":6,"seqnumber":54418,"src_ip":"10.136.216.199","src_mask":0,"src_port":1028,"src_sysnum":0,"tcp_flags":16,"tos":184}
    {"endtime":"2020-03-02T17:35:31.820000Z","timestamp":"2020-03-02T16:54:11.510000Z","bytes_in":68,"dest_ip":"xxx.xxx.187.28","dest_mask":0,"dest_port":5061,"dest_sysnum":0,"event_name":"netFlowData","exporter_ip":"10.136.57.2","exporter_sampling_interval":1000,"exporter_sampling_mode":1,"exporter_time":"2020-Mar-02 17:35:32","exporter_uptime":1553562496,"flow_end_rel":1553562316,"flow_start_rel":1551082006,"ingress_vlan":54,"input_snmpidx":49,"netflow_version":9,"nexthop_addr":"0.0.0.0","observation_domain_id":0,"output_snmpidx":0,"packets_in":1,"protoid":6,"seqnumber":54509,"src_ip":"10.136.189.15","src_mask":0,"src_port":1028,"src_sysnum":0,"tcp_flags":16,"tos":0}

I have been able to come up with a solution, but it only works with very small timeframes. I would like something that is significantly more robust. The code below will only work with a very limited number of events:

sourcetype=stream:netflow
| dedup src_ip,src_port,dest_ip,dest_port,timestamp,exporter_ip
| eval start_time = strptime(timestamp . "-0000", "%FT%T.%6QZ%z")
| eval end_time = strptime(endtime . "-0000", "%FT%T.%6QZ%z")
| eval diff_secs = end_time-start_time
| eval diff = tostring((diff_secs), "duration") 
| eval bps=if(isnull(bytes_in/diff_secs),0,bytes_in/diff_secs)
| addinfo 
| eval start_time_adj=if(start_time<info_min_time,info_min_time,start_time)
| eval temp=mvrange(start_time_adj,end_time)
| mvexpand temp 
| rename temp AS _time 
| bucket span=1s _time
| timechart sum(bps) as total_bps
Tags (2)
0 Karma
1 Solution

pestatp
Path Finder

The search that works the best for me in this scenario which is modified from to4kawa's answer is:

| makeresults 
| eval _raw="{\"endtime\":\"2020-03-02T17:35:31.850000Z\",\"timestamp\":\"2020-03-02T17:04:51.630000Z\",\"bytes_in\":64,\"dest_ip\":\"xxx.xxx.187.28\",\"dest_mask\":0,\"dest_port\":5061,\"dest_sysnum\":0,\"event_name\":\"netFlowData\",\"exporter_ip\":\"10.136.57.2\",\"exporter_sampling_interval\":1000,\"exporter_sampling_mode\":1,\"exporter_time\":\"2020-Mar-02 17:35:22\",\"exporter_uptime\":1553552496,\"flow_end_rel\":1553562346,\"flow_start_rel\":1551722126,\"ingress_vlan\":103,\"input_snmpidx\":114,\"netflow_version\":9,\"nexthop_addr\":\"0.0.0.0\",\"observation_domain_id\":0,\"output_snmpidx\":0,\"packets_in\":1,\"protoid\":6,\"seqnumber\":54418,\"src_ip\":\"10.136.216.199\",\"src_mask\":0,\"src_port\":1028,\"src_sysnum\":0,\"tcp_flags\":16,\"tos\":184}#
      {\"endtime\":\"2020-03-02T17:35:31.820000Z\",\"timestamp\":\"2020-03-02T16:54:11.510000Z\",\"bytes_in\":68,\"dest_ip\":\"xxx.xxx.187.28\",\"dest_mask\":0,\"dest_port\":5061,\"dest_sysnum\":0,\"event_name\":\"netFlowData\",\"exporter_ip\":\"10.136.57.2\",\"exporter_sampling_interval\":1000,\"exporter_sampling_mode\":1,\"exporter_time\":\"2020-Mar-02 17:35:32\",\"exporter_uptime\":1553562496,\"flow_end_rel\":1553562316,\"flow_start_rel\":1551082006,\"ingress_vlan\":54,\"input_snmpidx\":49,\"netflow_version\":9,\"nexthop_addr\":\"0.0.0.0\",\"observation_domain_id\":0,\"output_snmpidx\":0,\"packets_in\":1,\"protoid\":6,\"seqnumber\":54509,\"src_ip\":\"10.136.189.15\",\"src_mask\":0,\"src_port\":1028,\"src_sysnum\":0,\"tcp_flags\":16,\"tos\":0}" 
| makemv delim="#" _raw 
| stats count by _raw 
| rename COMMENT as "this is sample"
| spath 
| fields - _* count 
| dedup src_ip,src_port,dest_ip,dest_port,exporter_ip,timestamp
| eval start_time = strptime(timestamp . "-0000", "%FT%T.%6QZ%z")
| eval end_time = strptime(endtime . "-0000", "%FT%T.%6QZ%z")
| eval diff_secs = (end_time-start_time)+1
| eval diff = tostring((diff_secs), "duration") 
| eval bps=if(isnull(bytes_in/diff_secs),0,bytes_in/diff_secs)
| addinfo
| eval start_time_adj=if(start_time<info_min_time,info_min_time,start_time)
| eval temp=mvrange(start_time_adj,end_time+1)
| table exporter_ip bps temp
| eval bps=bps
| mvexpand temp
| rename temp AS _time 
| bucket span=1s _time
| timechart cont=f partial=f sum(bps) as total_bps by exporter_ip

A couple of the changes involve the mvrange start time. If you don't use the start time from your selected time range, then your timechart will display blank times way back to when the first start timestamp in your data and in mine, that is always significantly before the time range I want to see. I also split the information by exporter_ip which correlates to the IP of the network device sending the data.

View solution in original post

0 Karma

pestatp
Path Finder

The search that works the best for me in this scenario which is modified from to4kawa's answer is:

| makeresults 
| eval _raw="{\"endtime\":\"2020-03-02T17:35:31.850000Z\",\"timestamp\":\"2020-03-02T17:04:51.630000Z\",\"bytes_in\":64,\"dest_ip\":\"xxx.xxx.187.28\",\"dest_mask\":0,\"dest_port\":5061,\"dest_sysnum\":0,\"event_name\":\"netFlowData\",\"exporter_ip\":\"10.136.57.2\",\"exporter_sampling_interval\":1000,\"exporter_sampling_mode\":1,\"exporter_time\":\"2020-Mar-02 17:35:22\",\"exporter_uptime\":1553552496,\"flow_end_rel\":1553562346,\"flow_start_rel\":1551722126,\"ingress_vlan\":103,\"input_snmpidx\":114,\"netflow_version\":9,\"nexthop_addr\":\"0.0.0.0\",\"observation_domain_id\":0,\"output_snmpidx\":0,\"packets_in\":1,\"protoid\":6,\"seqnumber\":54418,\"src_ip\":\"10.136.216.199\",\"src_mask\":0,\"src_port\":1028,\"src_sysnum\":0,\"tcp_flags\":16,\"tos\":184}#
      {\"endtime\":\"2020-03-02T17:35:31.820000Z\",\"timestamp\":\"2020-03-02T16:54:11.510000Z\",\"bytes_in\":68,\"dest_ip\":\"xxx.xxx.187.28\",\"dest_mask\":0,\"dest_port\":5061,\"dest_sysnum\":0,\"event_name\":\"netFlowData\",\"exporter_ip\":\"10.136.57.2\",\"exporter_sampling_interval\":1000,\"exporter_sampling_mode\":1,\"exporter_time\":\"2020-Mar-02 17:35:32\",\"exporter_uptime\":1553562496,\"flow_end_rel\":1553562316,\"flow_start_rel\":1551082006,\"ingress_vlan\":54,\"input_snmpidx\":49,\"netflow_version\":9,\"nexthop_addr\":\"0.0.0.0\",\"observation_domain_id\":0,\"output_snmpidx\":0,\"packets_in\":1,\"protoid\":6,\"seqnumber\":54509,\"src_ip\":\"10.136.189.15\",\"src_mask\":0,\"src_port\":1028,\"src_sysnum\":0,\"tcp_flags\":16,\"tos\":0}" 
| makemv delim="#" _raw 
| stats count by _raw 
| rename COMMENT as "this is sample"
| spath 
| fields - _* count 
| dedup src_ip,src_port,dest_ip,dest_port,exporter_ip,timestamp
| eval start_time = strptime(timestamp . "-0000", "%FT%T.%6QZ%z")
| eval end_time = strptime(endtime . "-0000", "%FT%T.%6QZ%z")
| eval diff_secs = (end_time-start_time)+1
| eval diff = tostring((diff_secs), "duration") 
| eval bps=if(isnull(bytes_in/diff_secs),0,bytes_in/diff_secs)
| addinfo
| eval start_time_adj=if(start_time<info_min_time,info_min_time,start_time)
| eval temp=mvrange(start_time_adj,end_time+1)
| table exporter_ip bps temp
| eval bps=bps
| mvexpand temp
| rename temp AS _time 
| bucket span=1s _time
| timechart cont=f partial=f sum(bps) as total_bps by exporter_ip

A couple of the changes involve the mvrange start time. If you don't use the start time from your selected time range, then your timechart will display blank times way back to when the first start timestamp in your data and in mine, that is always significantly before the time range I want to see. I also split the information by exporter_ip which correlates to the IP of the network device sending the data.

0 Karma

to4kawa
Ultra Champion
| makeresults 
| eval _raw="{\"endtime\":\"2020-03-02T17:35:31.850000Z\",\"timestamp\":\"2020-03-02T17:04:51.630000Z\",\"bytes_in\":64,\"dest_ip\":\"xxx.xxx.187.28\",\"dest_mask\":0,\"dest_port\":5061,\"dest_sysnum\":0,\"event_name\":\"netFlowData\",\"exporter_ip\":\"10.136.57.2\",\"exporter_sampling_interval\":1000,\"exporter_sampling_mode\":1,\"exporter_time\":\"2020-Mar-02 17:35:22\",\"exporter_uptime\":1553552496,\"flow_end_rel\":1553562346,\"flow_start_rel\":1551722126,\"ingress_vlan\":103,\"input_snmpidx\":114,\"netflow_version\":9,\"nexthop_addr\":\"0.0.0.0\",\"observation_domain_id\":0,\"output_snmpidx\":0,\"packets_in\":1,\"protoid\":6,\"seqnumber\":54418,\"src_ip\":\"10.136.216.199\",\"src_mask\":0,\"src_port\":1028,\"src_sysnum\":0,\"tcp_flags\":16,\"tos\":184}#
     {\"endtime\":\"2020-03-02T17:35:31.820000Z\",\"timestamp\":\"2020-03-02T16:54:11.510000Z\",\"bytes_in\":68,\"dest_ip\":\"xxx.xxx.187.28\",\"dest_mask\":0,\"dest_port\":5061,\"dest_sysnum\":0,\"event_name\":\"netFlowData\",\"exporter_ip\":\"10.136.57.2\",\"exporter_sampling_interval\":1000,\"exporter_sampling_mode\":1,\"exporter_time\":\"2020-Mar-02 17:35:32\",\"exporter_uptime\":1553562496,\"flow_end_rel\":1553562316,\"flow_start_rel\":1551082006,\"ingress_vlan\":54,\"input_snmpidx\":49,\"netflow_version\":9,\"nexthop_addr\":\"0.0.0.0\",\"observation_domain_id\":0,\"output_snmpidx\":0,\"packets_in\":1,\"protoid\":6,\"seqnumber\":54509,\"src_ip\":\"10.136.189.15\",\"src_mask\":0,\"src_port\":1028,\"src_sysnum\":0,\"tcp_flags\":16,\"tos\":0}" 
| makemv delim="#" _raw 
| stats count by _raw 
| rename COMMENT as "this is sample"
| spath 
| fields - _* count 
| eval start_time = strptime(timestamp . "-0000", "%FT%T.%6QZ%z") 
| eval end_time = strptime(endtime . "-0000", "%FT%T.%6QZ%z") 
| eval diff_secs = end_time-start_time 
| eval diff = tostring((diff_secs), "duration") 
| eval bps=if(isnull(bytes_in/diff_secs),0,bytes_in/diff_secs) 
| eval temp=mvrange(start_time,end_time) 
| stats values(bps) as bps by temp 
| rename temp AS _time 
| bucket span=1s _time 
| timechart partial=f span=10m sum(bps) as total_bps

Hi, @pestatp
your query is good. mvexpand can't work with huge multi values.
so, try stats by

pestatp
Path Finder

The reason I said it only works with a limited number of events is because mvexpand generates a seriously large number of results. Using this on my data this morning for 15 minutes, it creates 12 million results from ~32,000 actual Netflow events. If I attempt to view anything longer than 15 minutes or so, then my results get truncated due to memory. I already upped the mvexpand memory to 2,048.

I was really hoping to find a less "expensive" way to accomplish this.

0 Karma

to4kawa
Ultra Champion

@pestatp
I see, mvexpand has a limit. you can modify limits.conf.
or use stats by
my answer is updated. please confirm.

0 Karma

pestatp
Path Finder

Using stats by prevented it from truncating results and it seems to be a bit faster, but still fairly slow.

This query took 48 seconds for 15 minutes worth of data:

sourcetype=stream:netflow
| fields - _* count 
| dedup src_ip,src_port,dest_ip,dest_port,exporter_ip,timestamp
| eval start_time = strptime(timestamp . "-0000", "%FT%T.%6QZ%z")
| eval end_time = strptime(endtime . "-0000", "%FT%T.%6QZ%z")
| eval diff_secs = (end_time-start_time)+1
| eval diff = tostring((diff_secs), "duration") 
| eval bps=if(isnull(bytes_in/diff_secs),0,bytes_in/diff_secs)
| addinfo
| eval start_time_adj=if(start_time<info_min_time,info_min_time,start_time)
| eval temp=mvrange(start_time_adj,end_time+1)
| table exporter_ip bps temp
| stats values(bps) as bps by temp, exporter_ip
| rename temp AS _time 
| bucket span=1s _time
| timechart cont=f partial=f sum(bps) as total_bps by exporter_ip

It's too bad that there isn't something like concurrency that can be used to count a field instead of just the number of events.

0 Karma

to4kawa
Ultra Champion

this query makes huge one second bps logs. so it's very slow.
if you want only one period, other query is useful.

0 Karma

pestatp
Path Finder

My solution was to modify your answer a little bit, but the biggest thing I did was upgrade our Splunk indexer. The original server was getting quite old and too slow for a query like this. Just the upgrade decreased the job time from 48 seconds to less than 2 for the same query.

to4kawa
Ultra Champion

great! It changes that much, I'm surprised.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...