Archive

Calculate total bps from NetFlow data

masato_sekiguch
Explorer

I am trying to calculate bandwidth from NetFlow data.

Each flow data has the timestamp of flow generated time, the timestamp of the first packet (%FIRST_SWITCHED), the timestamp of the last packet (%LAST_SWITCHED), and total received bytes (%IN_BYTES) and sent bytes (%OUT_BYTES).
By using these data, I can understand the average bandwidth of each flow by the following formula;

 in_bps = IN_BYTES * 8 / (LAST_SWITCHED - FIRST_SWITCHED)
 out_bps = OUT_BYTES * 8 / (LAST_SWITCHED - FIRST_SWITCHED)

However, I am not sure how I can create a timechart of total bandwidth of multiple flows.

Currently, _time is set by the timestamp when the flow data is received.
If I use it to derive total bandwidth like "index=netflow | timechart span=5min sum(in_bps) by PROTOCOL", it does not give me the right figure.
To derive the accurate result, I need to pick up the flow data that can meet the condition, current time >= FIRST_SWITCHED & current_time<=LAST_SWITCHED, and calculate the sum of in_bps or out_bps of each flow for that specific time.
But I have no idea how I can write such SPL to analyze multiple timestamps in a single log entry.

It would be great if someone can give me some idea to perform such task in Splunk.

Just in case, NetFlow data is like below json data;
{"22":1543571246000,"11":443,"12":"xxx.xxx.xxx.xxx","23":16550,"24":233,"14":0,"57590":91,"1":3209400,"2":2154,"4":6,"5":0,"6":27,"7":13726,"8":"yyy.yyy.yyy.yyy","57659":"FQDN of URL","10":0,"21":1543571253000}

I wrote field extractions, and each json data will be extracted to _time, LAST_SWITCHED, FIRST_SWITCHED, IN_BYTES, OUT_BYTES, PROTOCOL, and so on.

Tags (1)

pestatp
Path Finder

Did you ever come up with a solution to this? I am attempting to solve the same problem and can't quite figure it out.

Unfortunately I don't have the option to use Kafka, so a SPL solution would be the best for me.

0 Karma

to4kawa
SplunkTrust
SplunkTrust
| makeresults 
| eval _raw="{\"22\":1543571246000,\"11\":443,\"12\":\"xxx.xxx.xxx.xxx\",\"23\":16550,\"24\":233,\"14\":0,\"57590\":91,\"1\":3209400,\"2\":2154,\"4\":6,\"5\":0,\"6\":27,\"7\":13726,\"8\":\"yyy.yyy.yyy.yyy\",\"57659\":\"FQDN of URL\",\"10\":0,\"21\":1543571253000}" 
| spath 
| rename 1 as IN_BYTES, 22 as FIRST_SWITCHED, 23 as OUT_BYTES, 21 as LAST_SWITCHED, 4 as PROTOCOL 
| table LAST_SWITCHED, FIRST_SWITCHED, IN_BYTES, OUT_BYTES, PROTOCOL 
| foreach *_SWITCHED 
    [ eval <<FIELD>>_p =strftime('<<FIELD>>'/1000 , "%F %T") ] 
| eval in_bps = IN_BYTES * 8 / ((LAST_SWITCHED - FIRST_SWITCHED) /1000) 
| eval out_bps = OUT_BYTES * 8 / ((LAST_SWITCHED - FIRST_SWITCHED) / 1000)

So far we have calculated.

0 Karma

pestatp
Path Finder

This still wouldn't provide a total bandwidth from multiple events, correct?

I took the original question to be asking how to get the total bandwidth for a particular time from multiple events with total bytes and a time range.

0 Karma

to4kawa
SplunkTrust
SplunkTrust

sample log is only one. so I can't make query. will you provide 10~20 samples?

0 Karma

Richfez
SplunkTrust
SplunkTrust

Maybe I'm missing something, but I don't see any glaring issues with your methodology (see more below) so what makes you think it's wrong? And is it wrong from the view of some other product's numbers, or is it wrong from the view of "given the data I have, Splunk is literally not adding things up right?"

Simplifying a bit, you have a timespan and a number of bytes. Doing the math you did should get you a bits per (time unit of time difference). I can't confirm which number in your JSON maps to which field, but my guess would be the 1543561246000 is one of the LAST_SWITCHED or FIRST_SWITCHED, and there's another timstamp like that later. Those are not seconds, they also include the ms. So that means the math you have shows bits per millisecond, not bits per second. Could that be it?

0 Karma

masato_sekiguch
Explorer

Please ignore msec or sec. It is not important here.

IN_BYTES is total received bytes during FIRST_SWITCHED and LAST_SWITCHED.
I can derive the average bps by the formula I wrote, but when I think about drawing bps graph, the graph should have this average bps value starts from FIRST_SWITCHED and end at LAST_SWITCHED.
If I use SPL I wrote, the bps value of the flow only appears at _time in the graph.
It does not take into consideration of the session timespan, FIRST_SWITCHED and LAST_SWITCHED.

From mathematics view point, bps of the flow can be certain function over time. I use fx(t) in here for the flow, x. IN_BPS for flow x will be the integral of fx(t) from FIRST_SWITCHED and LAST_SWITHCED.

IN_BPS x = integrate.quad (lambda t , fx(t), FIRST_SWITCHED, LAST_SWITCHED)

To derive the total bandwidth at particular time, t1, we need to calculate below.

total_bps at t1 = f1(t1) + f2(t1) + ... + fx(t1)

timechart span=5min sum(in_bps) does not do this calculation.
timechart just use _time value and derive sum of all bps value without considering FIRST_SWTICHED and LAST_SWITCHED.

In our case, 50% of flow is completed less than a second, and the longest flow have 120sec.
So just deriving 5min average by IN_BYTES and OUT_BYTES would be good enough although it does not consider LAST_SWITCHED and FIRST_SWITCHED. If I use LAST_SWITCHED for _time, the result may be more accurate.

index=netflow | bin _time span=5min | stats sum(IN_BYTES) as IN_BYTES sum(OUT_BYTES) as OUT_BYTES by _time,PROTOCOL | eval 5m_avg=(IN_BYTES+OUT_BYTES)/300/1024/1024 | timechart span=5min max(5m_avg) as 5m_avg by PROTOCOL

Regarding flow data,

22: FIRST_SWITCHED
11: L4_DST_PORT
12: IPV4_DST_ADDR
23: OUT_BYTES
24: OUT_PKTS
14: OUTPUT_SNMP (interface number)
57590: L7_PROTO
1: IN_BYTES
2: IN_PKTS
4: PROTOCOL
5: SRC_TOS
6: TCP_FLAGS
7: L4_SRC_PORT
8: IPV4_SRC_ADDR
57659: HTTP_HOST
10: INPUT_SNMP (interface number)
21: LAST_SWITCHED

0 Karma

Richfez
SplunkTrust
SplunkTrust

Ah, OK, so have it use the average of the bit rate over the whole duration, only per second (or whatever - spread it out).

This is more difficult. It can be done, I have an example of something similar but it's a wee bit nasty and very specific to a certain type of data, so it'll take a bit of work before it's even ready to put here.

First, though - your particular question you didn't quite ask - you can use LAST_SWITCHED as the time if you want. Just assign it like index=netflow | eval _time = LAST_SWITCHED | bin _time ... and continue as you have. And right, this works reasonably well as long as your minimum time span you will be plotting is significantly longer than the longest time span in your data, but there's still a lot of boundary effects.

So, to solve the more general problem - the idea is to compute the average rate, via one of a couple of methods break that up into second-by-second values, then finally do your final summations.

So using placeholder numbers, let's have one flow have a start of 5 and an end of 10 (including 10) and a total of 120, another is start 7 end 9 flow 90 you'd average...

First one to 20 per unit of time, spread that into 5, 20, 6, 20, 7, 20, 8, 20, 9, 20, 10, 20.
Second one to 30 per unit of time, spread into 7, 30, 8,30, 9,30.

Then re-add that back up per second, 5, 20, 6, 20, 7, 50, 8, 50, 9, 50, 10, 20. And tada, that's really the numbers you need to work with.

As I said, I have some working code for a different problem that I think I can bend to doing this task so hopefully I can get a chance to sort it out in the next day or two and get a copy posted here.

I can fake up some close-enough data easily enough, and I should have time a little later this week to really dig through the example I have and convert it to this task. Someone else might jump in, but I think if you can wait a bit I'll have something worked up.

0 Karma

masato_sekiguch
Explorer

Thanks for your feedback.
The idea looks good although I am not sure how I can break up 1 json flow to multiple place holders. And I am also wondering if the suggested solution would be practical in the environment which have more than 1,000 flows per sec.
In my case, long-lived session will have 120 sec timespan. We need to have 120,000 place holders if we have 1,000 of such long-lived session.

I am currently evaluating kafka. NetFlow data is sent to kafka from the flow generator, and Splunk reads it via Kafka connect for Splunk.
It might be much easier to analyze flow data in Kafka layer to derive bps, pps, or other statistics, create new topic for statistics, and read that statistics from Splunk for visualization.
Although I am new to Kafka, I will also try that path and see which would be practical.

0 Karma
.conf21 CFS Extended through 5/20!

Don't miss your chance
to share your Splunk
wisdom in-person or
virtually at .conf21!

Call for Speakers has
been extended through
Thursday, 5/20!