I am trying to build a report for AWS FlowLogs which can be used to analyze SG usage. Specifically, I want a list of incoming traffic (by 'dest_ip') which shows all IP/port combinations. Unfortunately, a simple 'stats count by dest_ip,dest_port,protocol,src_ip,src_port' does not result in a usable report -- because all the stateful return-traffic is listed, too. There are 10K's worth of incoming packets with dest_port in the 1024-65535 range, i.e., where that particular 'dest' server had initiated a connection using an ephemeral local port and then the return traffic went to the same port. So 99% of the 'incoming' ports are not actual listeners which we need to include in our SGs.
I have spent hours testing various combinations of filters, e.g. count<5, or dest_port>18000 or (dest_port>1024 AND src_port<1024) or even a 'where NOT IN(src_port,22,53,80,3389, etc)'. But we have a lot of services which use high-port numbers so all these methods accidentally remove valid traffic.
Instead, I think the only accurate method would be one where each connection is evaluated for:
- is the incoming 'dest_port' above 1024?
- if so, is there a corresponding packet in the preceding 1000 ms, i.e., identical-but-reversed dest and src IP/ports?
- if so, assume this later packet is the return from a stateful request sent on an ephemeral port -- remove it from the results!
Has anyone else run into this situation, and what was your solution? Thank you for any suggestions!
For the moment, this is the approach I am taking:
This process allowed me to identify 25 incoming ports. I don't know if this is all of them but ignoring the return-traffic from those 'src_ports' reduced the list of src/dest combinations from 300K to 10K
You can create another field - application port - (app_port) - which has a value lower than random client port - as follows:
| eval app_port=if(((src_port<dest_port AND src_port!=0) OR dest_port==0),src_port,dest_port)
then use app_port in your stats and/or WHERE,
Unfortunately, we have app-ports all the way into the 50,000 port-range, and the ephemeral ports start at 1024. So 'src_port
In our product, NetFlow Optimizer, there is a rule/module that stitches request/reply flows. It is based on a list of known application ports (https://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers), but the list is configurable (you can upload your own list into the rule). Please contact us directly if have any questions or would like to try it - trials@netflowlogic.com
For the moment, this is the approach I am taking:
This process allowed me to identify 25 incoming ports. I don't know if this is all of them but ignoring the return-traffic from those 'src_ports' reduced the list of src/dest combinations from 300K to 10K
I would like to have also filtered on 'WHERE src_port>1024' since that's always true for ephemeral ports, but I didn't know how to combine the 2 filters in 1 WHERE -- when I tried it I got unexpected results.