port scan across multiple "_time" spans

a_custom_user · ‎10-05-2020

Hi all, using the following:

${index+sourcetype-information} NOT src_ip IN ("10.*","127.*","192.168.*","172.16.0.0/12") dest_ip IN ("10.*","192.168.*","172.16.0.0/12") dest_port>-1 NOT dest_port IN (80,443) | bin _time span=5m | stats dc(dest_ip) as d_C by src_ip dest_port | where d_C > 99

How to get "_time" of the first occurrence and the "_time" of the second occurrence? Also, does "span=5m" consider say 45 results in 09:08:00 - 09:09:59 and 52 results in 09:10:00 - 09:12:59?

a_custom_user · ‎10-06-2020

in the real world scenario, we see 45 results in 09:08:00 - 09:09:59 and 59 results in 09:10:00 - 09:12:59 (a typographical error in the original question of entering 52 instead of 59). it was found using iterations, but when tried to use the original query as well as the "DTFrame" one, both did not give the results. hence, there must be some way to adjust the query to have the 104 results detected.

ITWhisperer · ‎10-06-2020

OK I think it is becoming clearer. Are you looking to a count for a rolling 5 minute window? That is, so that you can find when the count in any 5 minutes is greater than 99?

a_custom_user · ‎10-06-2020

as it is a real world scenario, the query should be looking into any 5 minutes frame and not just between the multiples of 5. hence, in this case, it will be great if it is possible to draft a query to look within any 5 minutes and/or any time frame.

ITWhisperer · ‎10-06-2020

${index+sourcetype-information} NOT src_ip IN ("10.*","127.*","192.168.*","172.16.0.0/12") dest_ip IN ("10.*","192.168.*","172.16.0.0/12") dest_port>-1 NOT dest_port IN (80,443) 
/* set 1 minute buckets */
| bin _time span=1m 
/* collect dest_ip by src_ip and dest_port in _time buckets, these will be dedup'd by values function */
| stats values(dest_ip) as dest_ip by src_ip dest_port _time
/* autoregress to collect the previous 4 minutes dest_ip collections */ 
| autoregress dest_ip p=1-4
/* autoregress to get src_ip from 4 buckets back so we can detect change in src_ip */
| autoregress src_ip p=4
/* autoregress to get dest_port from 4 buckets back so we can detect change in dest_port */
| autoregress dest_port p=4
/* if src_ip and dest_port are the same as 4 buckets ago, appends all dest_ip collections (splitting previous collections as we go) */
| eval alldest_ips=if(src_ip=src_ip_p4 AND dest_port=dest_port_p4,mvappend(dest_ip,split(dest_ip_p1," "),split(dest_ip_p2," "),split(dest_ip_p3," "),split(dest_ip_p4," ")),null())
/* dedup all dest_ips from last 5 minutes */
| eval alldest_ips=mvdedup(alldest_ips)
/* and count how many are unique */
| eval d_C=mvcount(alldest_ips)
/* rename dest_port to make fields command easier (not necessary if you don't mind putting all the fields in) */
| rename dest_port AS port_dest 
/* remove unwanted fields */
| fields - *_p* alldest_ips
/* rename port_dest back to original name */
| rename port_dest AS dest_port
/* filter results */
| where d_C > 99

a_custom_user · ‎10-26-2020

Tried on my end, but somehow could not find the results to be making. Want to check if it is possible to use something like:

${index+sourcetype-information} NOT src_ip IN ("10.*","127.*","192.168.*","172.16.0.0/12") dest_ip IN ("10.*","192.168.*","172.16.0.0/12") dest_port>-1 NOT dest_port IN (80,443)
| eval combination = _time.","dest_ip
| stats values(combination) as V by src_ip dest_port
| where ( mvcount(V) > 99 )
| eval singleString = mvjoin(V, ";")
| rex max_match=0 field=singleString "(;|)(?<v>.*?{100})(;|)"

So that we can then collection 100 occurrences in another field and then do splits and check something like:

Python reference:

newData = v.split('|', -1);

# "differenceInTime" < 300 seconds

differenceInTime = newData[-1].split('^', -1)[0] - newData[0].split('^', -1)[0];

# "destinationIpSet" length = 100

destinationIpSet = set([ x.split('^', -1)[1] for x in newData ]);

Is it possible to implement something like this?

ITWhisperer · ‎10-26-2020

Try this for your rex

| rex max_match=0 field=singleString "(?<v>([^;]+(;|)){1,100})"

Each instance of v should contain up to 100 semi-colon delimited groups

a_custom_user · ‎10-27-2020

According to our back and forth discussion, we tried:

${index+sourcetype-information} NOT host IN ("10.128.220.19","10.169.0.1*") NOT src_ip IN ("10.*","127.*","192.168.*","172.16.0.0/12") dest_ip IN ("10.*","192.168.*","172.16.0.0/12") dest_port>-1 NOT dest_port IN (80,443)
| eval combination = _time.",".dest_ip
| stats values(combination) as V by src_ip dest_port
| where ( mvcount(V) > 99 )
| eval singleString = mvjoin(V, ";")
| rex max_match=0 field=singleString "(?<v>([^;]+(;|)){1,100})"
| search v=*
| mvexpand v
| where mvcount(split(v, ";") ) > 99
| rex max_match=0 field=v "(?i)^(?<sT>[^,]*),[^;]*;([^,]*,[^;]*;){98}(?<eT>[^,]*),.*$"
| where ( eT - sT < 300 ) AND ( mvcount(mvdedup(split(replace(v, "\d+,", ""), ";"))) > 99 )
| eval sT = strftime(sT, "%c")
| eval eT = strftime(eT, "%c")
| table src_ip dest_port sT eT v

Only item we want to check is the way to use "[^,]*," instead of "\d+," in either "replace" or by another function so that we can try to make it generic if possible.

It is able to detect scanning in this case it is host enumeration.

Few issues with the query:

* Possibly computation expensive

* Can only be run say from "00:55:00" to "01:05:00", "01:00:00" to "01:00:00" to "01:10:00", etc. and may have results that are common in the Date-Time frames as it is mainly for high events per second (EPS) datasets with average of 15 K EPS

* If the scanning is on say "10.0.0.0/24" with all the IPs within 5 minutes (00:02:00 till 00:07:00), then there will be multiple results such as "10.0.0.0" to "10.0.0.99", "10.0.0.1" to "10.0.0.100", "10.0.0.2" to "10.0.0.101", etc. until "10.0.0.156" to "10.0.0.255"

a_custom_user · ‎10-07-2020

it will take me some time to understand the query and get more insight of each and every sub-query. will post as soon as the process completes.

a_custom_user · ‎10-06-2020

tried:

| bin _time span=5m as DTFrame
| stats dc(dest_ip) as d_C by src_ip dest_port DTFrame

but still the "DTFrame" is multiple of 5.

How can we redraft?

ITWhisperer · ‎10-06-2020

Sorry, you are right, span=5m will do 5 minute chunks aligned to the hour not from earliest time

If you want to have 5 minute chunks aligned to the earliest time, you will have to create a field which has been adjusted accordingly and bin based on that. For example

| eval shifted=_time - (earliest % (5 * 60))
| bin span=5m shifted

Having said that, I am still not sure how this is going to help you

ITWhisperer · ‎10-05-2020

bin _time span=5m has no effective impact to your query because your stats command does not include the _time field. If it were to used, it resets the _time field into 5 minute chunks, e.g. all events from 09:08:00 - 09:12:59 would have _time set to 09:08:00, all events from 09:13:00 - 09:17:59 would have _time set to 09:13:00, etc.

port scan across multiple "_time" spans

using Enterprise Security

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor