All Apps and Add-ons

How to remove a cluster after applying DBSCAN?

rosho
Communicator

Hi

I am working on a forecasting problem.
I want to use DBSCAN to detect outliers and then apply Kalman filter to make forecasts.

But I do not know how to remove or transform the samples inside a cluster.
How can I connect these 2 "algorithms"

#THIS IS TO APPLY DBSCAN
| inputlookup fortigate_QC_May2019_logins.csv
| fit StandardScaler "logins" with_mean=false with_std=true
| fit DBSCAN eps=0.2 "SS_logins" 



#THIS IS TO FORECAST WITH KALMAN FILTER
| predict "logins" as prediction algorithm=LLP5 holdback=288 future_timespan=324 upper95=upper95 lower95=lower95 
| `forecastviz(324, 288, "logins", 95)`
| where prediction!="" AND 'logins' != ""
| `regressionstatistics("logins", prediction)`

Thank you

0 Karma
1 Solution

rosho
Communicator

This is the SPL:

|fit DBSCAN eps=0.6 "SS_logins"
|where NOT cluster==-1
| predict "SS_logins" as prediction algorithm=LLP holdback=288 future_timespan=324 upper95=upper95 lower95=lower95
|forecastviz(324, 288, "SS_logins", 95)

The 2nd line is how I remove the clusters.

View solution in original post

0 Karma

rosho
Communicator

This is the SPL:

|fit DBSCAN eps=0.6 "SS_logins"
|where NOT cluster==-1
| predict "SS_logins" as prediction algorithm=LLP holdback=288 future_timespan=324 upper95=upper95 lower95=lower95
|forecastviz(324, 288, "SS_logins", 95)

The 2nd line is how I remove the clusters.

0 Karma

pdrieger_splunk
Splunk Employee
Splunk Employee

Hi rosho,

let's assume your outlier detected by DBSCAN are marked with a cluster=-1 then you can easily exclude them from your search results of the first part of your search by filtering with | where cluster>-1. Subsequently you can run your forecasting part.

However I would recommend to you to have equidistant timestamps e.g. by using a | timechart command before your forecasting part to have a proper input for many forecasting algorithms. You might also think of filling the gaps with imputed values for the sake of training your forecasting model on your "cleaned" assumptions. You might find the Imputer useful here: https://docs.splunk.com/Documentation/MLApp/4.3.0/User/Algorithms#Imputer

Instead of | predict I would also highly recommend to you to have a look at the StateSpaceForecast algorithm newly introduced in the MLTK 4.2: https://docs.splunk.com/Documentation/MLApp/4.3.0/User/Algorithms#StateSpaceForecast

You might find this blog useful that explains it with an example: https://www.splunk.com/blog/2019/03/20/what-s-new-in-the-splunk-machine-learning-toolkit-4-2.html

Hope this is helpful to you?

rosho
Communicator

Is "intervention detection" the same as the "Imputer"?

Intervention detection

I would replace packets of contiguous missing values with hourly averages around the missing values. If the values are not missing but are anomalous either manually adjust them or estimate what they should have been via **Intervention Detection* which is essentially a forward prediction/fitted value for an anomaly.
Outliers represent effects/variables that are omitted from your model and if possible need to be identified and accounted for by adding additional predictor series or worst case dummy indicators.*

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

.conf25 Global Broadcast: Don’t Miss a Moment

Hello Splunkers, .conf25 is only a click away.  Not able to make it to .conf25 in person? No worries, you can ...

Observe and Secure All Apps with Splunk

 Join Us for Our Next Tech Talk: Observe and Secure All Apps with SplunkAs organizations continue to innovate ...

What's New in Splunk Observability - August 2025

What's New We are excited to announce the latest enhancements to Splunk Observability Cloud as well as what is ...