Splunk Enterprise Security

How can I troubleshoot the 'Substantial Increase in Port Activity' ES Correlation Search?

gf13579
Communicator

I have a significant number of Notables raised by the Substantial Increase in Port Activity correlation search.

Picking the latest, just as an example:

A statistically significant increase in the volume of activity on port 9571 was noted. Today's value is 24.

The correlation search runs every 15 minutes, looks at network traffic from the last 24 hours and throttles by dest_port.

Filtering uses Extreme Search:

| xswhere count from count_by_dest_port_1d in network_traffic by dest_port is extreme

The Content Gen search behind count_by_dest_port_1d is Port Activity By Destination Port - Context Gen
. Here it is filtered on port 9571 and without the | xscreateddcontext bit.

| tstats  `summariesonly` count as dest_port_traffic_count from datamodel=Network_Traffic.All_Traffic where All_Traffic.dest_port=9571 earliest=-30d@d latest=-1d by All_Traffic.dest_port,_time span=1d
| `drop_dm_object_name("All_Traffic")` 
| `context_stats(dest_port_traffic_count, dest_port)` 

Running that manually shows me I have activity on this port for 19 of the last 30 days (the count value) and stats looking like this:

+-----------+-------+-----+-----+----------+--------+-----------+
| dest_port | count | min | max |   avg    | median |   size    |
+-----------+-------+-----+-----+----------+--------+-----------+
|      9571 |    27 |   1 | 383 | 52.62963 |     33 | 82.030552 |
+-----------+-------+-----+-----+----------+--------+-----------+

So how is 24 hits on port 9571 a significant increase?

I used the following to look at the thresholds for 'extreme' activity on that port:

| inputlookup network_traffic.context.csv 
| search class=9571 
| table class,concept,center,count,domainMax,domainMin,points,size,type

Giving:

+-------+---------+--------+-------+-----------+-----------+-------------------------+------+-----------------+
| class | concept | center | count | domainMax | domainMin |         points          | size |      type       |
+-------+---------+--------+-------+-----------+-----------+-------------------------+------+-----------------+
|  9571 | minimal |      7 |    16 |      20.8 |      -6.8 | -6.8|-3.3|0.0           |  4.6 | median_centered |
|  9571 | low     |      7 |    16 |      20.8 |      -6.8 | -6.8|-3.3|0.0|3.5|7.0   |  4.6 | median_centered |
|  9571 | medium  |      7 |    16 |      20.8 |      -6.8 | 0.0|3.5|7.0|10.4|13.9   |  4.6 | median_centered |
|  9571 | high    |      7 |    16 |      20.8 |      -6.8 | 7.0|10.4|13.9|17.3|20.8 |  4.6 | median_centered |
|  9571 | extreme |      7 |    16 |      20.8 |      -6.8 | 13.9|17.3|20.8          |  4.6 | median_centered |
+-------+---------+--------+-------+-----------+-----------+-------------------------+------+-----------------+

I can see there that the activity count for port 9571 over the last 24 hours is above the threshold for extreme (20.8, above). The same can be seen in the pretty version generated by a column chart and this search:

| xsdisplaycontext from count_by_dest_port_1d in network_traffic by 9571

Question Why are those the thresholds so low (20.8+ is 'above extreme') , if the activity from the last 30 days has an median of 33? Does it relate to me only having activity on 19 of those 30 days?

The daily content gen is using xscreateddcontext (not xsUpdateDDContext) - I've been assuming that would destroy any existing results and regenerate the context based on activity per dest_port per day over the last 30 days. Does it instead adjust the existing thresholds?..

The XSV app isn't installed. If there's something I can do using that app that I'm not able to get from these raw commands I'd be interested in hearing the steps required.

0 Karma

splunk2019vg
New Member

Hi even I am exactly looking something same , I have lots of port activity notable events. Can you suggest how did you tune XS .

0 Karma

jtlittle
Path Finder

I would just turn it off and work on actionable alerts.

the data volume thats leaving is more interesting then the fact that ports are busy.

I find if the built in alerts are noisy they need to be tuned and its easier for me to build out each alert based off technology stack then this type of single source signal. Maybe pair it with other data to have more relevance. Oh and what the guy ^^^ says about adjust it maybe if you know what to change it to.

gf13579
Communicator

Thanks. Nice point around the amount of data that's leaving.

I've been tempted to turn it off ever since I turned it on, I just didn't want to do so without understanding whether there was something relatively simple I could do to make it provide some tangible value. At this point I'm struggling, though it's been a good learning exercise.

0 Karma

jtlittle
Path Finder

what I find with Splunk and where my skill is. Everything is relatively time consuming and on developer level (using add-on builder/data enrichment[tag/normalize]) when it comes to enriching your data in order to get the alerts to correlate anything besides single sources (port activity) this is a result of missing some data which will make the cooler and more useful alerts fire.I think maybe the CIM is not being enriched by other sources of data. I see this alert first before anything else fire on my ES cloud stack. Really annoying but I consider it dumb alert that reminds me I need to focus on CIM data to show more enrichment by send different types of logs/sources/signals. I also copy my correlation searches to a wiki type thing and just check it once a week to see if I can add remove or update my rules for better alerting. at that point I can turn anything back on if testing didn't go as planned. (all in Dev)

mcormier_splunk
Splunk Employee
Splunk Employee

You are correct in that xsCreateDDContext is really just a 30 day moving window that models the amount of bytes per port. I would suggest rewriting the Context Gen to use xsUpdateDDContext instead to run nightly at 1am (or anytime early in the morning) with a 1 day lookback in the tstats (earliest=-1d@d latest=0d@d). Once you change the create to update, you can run, by hand, the create version but use a much larger window (earliest=-360d@d latest=0d@d, or go back farther if you're so inclined).

These new searches will give you a base of one years data as a model. The update will add it's information in a weighted average fashion so it doesn't skew the underlying model too much.

Please let me know if you have more questions or if this isn't making any sense. I'd also strongly suggest installing XSV as the dashboards it provides are much more intuitive and functional (yeah d3 charts!!!).

jcoates
Communicator

What Mike said 🙂 This is likely to be a false positive from a short training window; however, extending the training window can produce false negatives as well (see holiday problem discussion in this deck: http://www.scianta.com/docs/Coates_MLtoCC_Talk_2017.pdf ) Mike's suggestion uses weighting on the updates to try and smooth out that effect.

Here's the XSV app suggested: https://splunkbase.splunk.com/app/2855/

And for luck, here's George Starcher's excellent series on using XS: http://www.georgestarcher.com/tag/extreme-search/

0 Karma

gf13579
Communicator

Hi Mike and Jack.

I've got plenty of training data - 90 days, though I'm using the out-of-the-box content gen window (-30d to -1d).

I've been seeing weirdness with the context not being updated (at all) or being updated with the wrong values. The resultant lookup file is 53MB due to 5 rows (high, extreme etc.) per port - and the underlying tstats search is producing results for over 60,000 ports. I've tried splitting everything in half - ports <30k and ports >=30k) and don't see any issues with the values not being updated.

I think that may have been the cause of the confusing numbers in my post above, where the median (33) returned by tstats isn't the same as the center value in the context generated from the same data.

Even after addressing this, the 'is extreme' criterion in the correlation search being matched by activity for 16,000 ports, so without some further thought it's not providing much value. I don't know whether this is due to an unusual customer environment, or whether no-one ever rolls this search out without limiting it to a specific set of assets, filtering on Allowed traffic only, filtering on external destinations etc.

As an example, port 10007 over the last 24hrs has an activity count 385. That's matching Extreme, based on a context generated from 30 days looking like this:

https://imgur.com/a/Bac62

The resultant thresholds are shown in the second image there.

I take it the reason for the overlaps is due to the erratic per-day levels in the source data? It makes me wonder whether unusual flagging dest port activity per {source,dest_port} would be of greater value, though I'm guessing the context data will be huge.

0 Karma

mcormier_splunk
Splunk Employee
Splunk Employee

I believe you are running into a situation where contexts are normalized and then adjusted based on modifying the max, IIRC. Normalized contexts are pretty but usually not an effective way to model the underlying data as that's almost never distributed in a normalized fashion. What I think you really want is a model of the expected amount of bytes received when you receive any bytes at all. To create/update a context like this, you need to look at using the one of two different methods. No matter which method you use, you will probably want to filter out any period where the #bytes = 0, for the reason I mentioned above. The question you are trying to answer is "If I receive any bytes on this port, is the amount of bytes unusual?"

Method #1: Use xsCreateDDContext type=avg_centered. by using this type (or type=median_centered) the keys are the center and size. By default center is either the median or avg and the size is the stdev. In your case you probably want avg_centered. The reason for this approach is it calculates an expected (avg) number of bytes and then uses stdev to determine the distance from the avg. Extreme can then be 2, 3 or 4 stdevs away from the center, depending on your level of sensitivity.

Method #2: Use xsCreateCDContext (create a Crossover Driver [cd] context). A CD context is one where you calculate the points at which you "crossover" from one concept to the next. CD contexts are almost never normalized as the crossover points tend to be generated by methods such as using perc(). For example, if you're using "minimal,low,medium,high,extreme" as your concepts, you might calculate the crossover points like this:

| eval minimal_low=perc20(bytes) | eval low_medium=perc40(bytes) | eval medium_high=perc60(bytes) | eval high_extreme=perc80(bytes) | xsCreateCDContext ...

You can also add a min and/or max definition and, of course, adjust any of these percs based on your needs. This is pretty well documented in the XSV app.

What contexts always come down to is these types of questions: What am I really modeling? Do I understand my data and how it behaves? What behavior of the data am I interested in? How sensitive should my measurements be? These questions are almost always driven by a SME or "data scientist". Extreme Search isn't really too difficult to learn. It's figuring out what you need to know about your data that takes the time and energy.

If you want me to join you on a webex for a bit to go through any of this, just let me know. I'm more than happy to help!

gf13579
Communicator

Thanks again guys.

Mike - I might take you up on the webex at some point.

I'll be honest - as much as I'm enjoying learning a bit about extreme search, my more pressing requirement is to work out how (or whether to even bother) to try to make this ootb correlation search provide value, without investing much more time than I've already spent.

It's figuring out what you need to
know about your data that takes the
time and energy.

Understood. I guess in this context I'm interested in alerting on one or more machines suddenly communicating to destination ports they don't normally talk to. Maybe that's always going to result in too much noise to provide value without limiting it in some other way, or maybe there's some refinement to the model that would identify that more clearly.

0 Karma

jcoates
Communicator

Hi GF,

context not getting updated is definitely concerning. If it's XS crashing, then we need to know and fix it, so if you're getting core dumps please open a support ticket and reference Extreme Search. If it's Splunk having to bail out and skip those searches, who knows what else is skipping. On ES that could mean time windows not being reviewed for whatever the correlation searches look for, which is potentially getting into due diligence and compliance stuff.

I do think that the question being asked here (do we have unusual traffic on this port) is a little questionable... in a lab you might see lots of activity in 1-1024 and a handful of apps (games, IM, fileshare, VOIP and videoconference) up in the high ports, but on a production network with thousands of real people using the real internet there's going to be a lot of churn as fashions change and apps evolve. Regardless of how it's implemented, an ML test is depending on the past predicting the future; as you're seeing, the data on a given high port is pretty erratic and doesn't really make a good prediction. It would probably be more interesting to ask if the data for a port is changing from erratic to stable or from stable to erratic... maybe instead of count of connections or bytes, something like "my data| fields bytes port action | fieldsummary | stats max(stdev) by port"??

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...