Archive

Problem with timechart showing portscanning by srcip

New Member

I wanted a timechart to show portscanning of Juniper routers, but have run into a snag that I can't figure out. The syslog message from the router follows this format:
2018-06-25T17:19:51+00:00 PFEFWSYSLOG_IP: FW: D (tcp|udp) (1 packets)

I'm defining a portscan as any srcip that hits any router on 10 or more distinct ports within a 30sec window. Here's the splunk query I'm using:

sourcetype=syslog PFE_FW_SYSLOG_IP AND " D "  AND NOT (" 3784 " OR " 179 ") | rex field=_raw "(?<srcip>\d+\.\d+\.\d+\.\d+) (?<dstip>\d+\.\d+\.\d+\.\d+) (?<srcport>\d+) * (?<dstport>\d+)" | where dstport>=1 AND dstport<=30000 | bucket span=30s _time | eventstats dc(dstport) AS port_scan by srcip, dstip, _time | where port_scan > 10 | timechart dc(dstport) by srcip useother=f usenull=f

The timechart works properly when I've selected 8 hours of data, but stops working beyond 10 hours of previous data. If I slide the time selection to specify the previous time range, then timechart shows that there are srcips that meet the criteria that were not previously shown.

Any clues on how to get this to work?

0 Karma

Esteemed Legend

When you do not specify a span= clause to your timechart you are letting splunk automagically determine what this should be based on the overall span of your Timepicker. In other words you are not always using a 30-second window; the longer your Timepicker span, the loner the span= it will use. You can start by using timechart span=30s to gain consistency but you will quickly find that you cannot do very wide Timepicker spans because you will hit the 50K plotted element limit.

0 Karma

New Member

I didn't realize that span= is more of a suggestion than an actual setting. Your right about hitting the 50K plotted element limit when increasing the timepicker span. To deal with that, I use the sample option and set it to 1:10.

0 Karma

Esteemed Legend

OK, then UpVote any good answers and click Accept on the best one.

0 Karma

SplunkTrust
SplunkTrust

1) Timechart has a maximum of 50K data points. At 30 seconds, that's less than 500-hours of srcip-destip time, so you are probably running into issues with too much timeframe.

2) Your fixed 30 second windows might miss some cases where the scan crossed a 30-second boundary... but in practice, if they are port scanning, they are unlikely to spread it over very much time.

3) You can probably extend your ability to view the data if you use streamstats to tag each srcip that hits your criteria for a few minutes before and after, rather than all of the data of each one.


Here's a quick attempt to cull down the data a bit for you...

sourcetype=syslog PFE_FW_SYSLOG_IP AND " D "  AND NOT (" 3784 " OR " 179 ") 
| rex field=_raw "(?<srcip>\d+\.\d+\.\d+\.\d+) (?<dstip>\d+\.\d+\.\d+\.\d+) (?<srcport>\d+) * (?<dstport>\d+)" 
| where dstport>=1 AND dstport<=30000 
| bucket span=30s _time 
| stats dc(dstport) AS port_scan values(dstport) as dstport by  srcip, _time, dstip

| rename COMMENT as "mark the scans, then add up how many scans each srcip has"
| eval scanFlag=case(port_scan>=10,1)
| eventstats sum(scanFlag) as bigFlag by srcip

| rename COMMENT as "mark all times within 5 minutes of a scan before or after"
| streamstats window=10 max(scanFlag) as keepme2 by srcip 
| reverse   
| streamstats window=10 max(scanFlag) as keepme3 by srcip 

| rename COMMENT as "Keep data within 5 minutes of a scan, or keep all with a srcip with more than 10 scans"
| where isnotnull(keepme2) OR isnotnull(keepme3) OR bigFlag>10

| rename COMMENT as "Now show the chart"
| timechart dc(dstport) by srcip useother=f usenull=f


| appendpipe [| stats max(port_scan) AS max_scan by srcip | sort 50 - port_scan ]
| eventstats max(max_scan) as max(scan) by srcip
| where isnotnull(max_scan)

You probably may need to cull the list somewhat, since there are valid processes that intentionally rotate ports.

0 Karma

New Member

You have an interesting suggestion however running it produces no output.

0 Karma

Contributor

Hi,

Have you take a look at the limit parameter of timechart function ?
If you try :

 sourcetype=syslog PFE_FW_SYSLOG_IP AND " D "  AND NOT (" 3784 " OR " 179 ") 
| rex field=_raw "(?<srcip>\d+\.\d+\.\d+\.\d+) (?<dstip>\d+\.\d+\.\d+\.\d+) (?<srcport>\d+) * (?<dstport>\d+)" 
| where dstport>=1 AND dstport<=30000 
| bucket span=30s _time 
| eventstats dc(dstport) AS port_scan by srcip, dstip, _time 
| where port_scan > 10 
| timechart dc(dstport) by srcip useother=f usenull=f limit=0

Does it change something ?

Kail

0 Karma

New Member

No, the timechart does not show any data prior to 8 hours. The limit=0 didn't change anything.

0 Karma

Champion

hmm could it be a visualization issue? What @somesoni2 is trying to say is are you validating results bu looking at the chart (visualization) tab only? It is possible that your counts have abnormal peaks, if you can check the statistics tab (export results as a csv from the stats. tab) and see if the previous hour csv/excel rows are indeed showing 0/no counts?

0 Karma

SplunkTrust
SplunkTrust

How are you showing the timechart output, in a table OR in a chart? Give below query a try as well (when you post your query, leave a newline before it, select whole query and press Ctrl+K OR click on '101 010' button on top of this text area. Without that your query is not formatted and missed several portion of it) .

sourcetype=syslog PFE_FW_SYSLOG_IP AND " D " AND NOT (" 3784 " OR " 179 ") | rex field=_raw "(?\d+.\d+.\d+.\d+) (?\d+.\d+.\d+.\d+) (?\d+) * (?\d+)" | where dstport>=1 AND dstport<=30000 | bucket span=30s _time | tstats dc(dstport) AS port_scan by srcip, dstip, _time | where port_scan > 10 | timechart dc(dstport) by srcip useother=f usenull=f span=30s
0 Karma

New Member

Thanks for the ctrl-K tip. I'm using a timechart as output.

0 Karma