All Apps and Add-ons

How to remove a cluster after applying DBSCAN?

rosho
Communicator

Hi

I am working on a forecasting problem.
I want to use DBSCAN to detect outliers and then apply Kalman filter to make forecasts.

But I do not know how to remove or transform the samples inside a cluster.
How can I connect these 2 "algorithms"

#THIS IS TO APPLY DBSCAN
| inputlookup fortigate_QC_May2019_logins.csv
| fit StandardScaler "logins" with_mean=false with_std=true
| fit DBSCAN eps=0.2 "SS_logins" 



#THIS IS TO FORECAST WITH KALMAN FILTER
| predict "logins" as prediction algorithm=LLP5 holdback=288 future_timespan=324 upper95=upper95 lower95=lower95 
| `forecastviz(324, 288, "logins", 95)`
| where prediction!="" AND 'logins' != ""
| `regressionstatistics("logins", prediction)`

Thank you

0 Karma
1 Solution

rosho
Communicator

This is the SPL:

|fit DBSCAN eps=0.6 "SS_logins"
|where NOT cluster==-1
| predict "SS_logins" as prediction algorithm=LLP holdback=288 future_timespan=324 upper95=upper95 lower95=lower95
|forecastviz(324, 288, "SS_logins", 95)

The 2nd line is how I remove the clusters.

View solution in original post

0 Karma

rosho
Communicator

This is the SPL:

|fit DBSCAN eps=0.6 "SS_logins"
|where NOT cluster==-1
| predict "SS_logins" as prediction algorithm=LLP holdback=288 future_timespan=324 upper95=upper95 lower95=lower95
|forecastviz(324, 288, "SS_logins", 95)

The 2nd line is how I remove the clusters.

0 Karma

pdrieger_splunk
Splunk Employee
Splunk Employee

Hi rosho,

let's assume your outlier detected by DBSCAN are marked with a cluster=-1 then you can easily exclude them from your search results of the first part of your search by filtering with | where cluster>-1. Subsequently you can run your forecasting part.

However I would recommend to you to have equidistant timestamps e.g. by using a | timechart command before your forecasting part to have a proper input for many forecasting algorithms. You might also think of filling the gaps with imputed values for the sake of training your forecasting model on your "cleaned" assumptions. You might find the Imputer useful here: https://docs.splunk.com/Documentation/MLApp/4.3.0/User/Algorithms#Imputer

Instead of | predict I would also highly recommend to you to have a look at the StateSpaceForecast algorithm newly introduced in the MLTK 4.2: https://docs.splunk.com/Documentation/MLApp/4.3.0/User/Algorithms#StateSpaceForecast

You might find this blog useful that explains it with an example: https://www.splunk.com/blog/2019/03/20/what-s-new-in-the-splunk-machine-learning-toolkit-4-2.html

Hope this is helpful to you?

rosho
Communicator

Is "intervention detection" the same as the "Imputer"?

Intervention detection

I would replace packets of contiguous missing values with hourly averages around the missing values. If the values are not missing but are anomalous either manually adjust them or estimate what they should have been via **Intervention Detection* which is essentially a forward prediction/fitted value for an anomaly.
Outliers represent effects/variables that are omitted from your model and if possible need to be identified and accounted for by adding additional predictor series or worst case dummy indicators.*

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...