Highlighted

## Splunk Machine Learning App / Toolkit - Using DBSCAN Clustering Algorithm

Path Finder

Hi,

I want to use the Clustering Algorithm "DBSCAN" from the Machine Learning Toolkit.
(https://docs.splunk.com/Documentation/MLApp/2.3.0/User/Algorithms) --> listed under "clustering algorithms"

Now, upon implementation, I noticed, that this algorithm only needs one parameter: EPS
(maximum distance between two samples for them to be considered in the same cluster)

Now if you look up any definition of the DBSCAN Algorithm, for example...
(https://en.wikipedia.org/wiki/DBSCAN)
...you will notice that a DBSCAN algorithm will need 2 Parameters to be functional:

• EPS (Epsilon): maximum distance between two samples --> provided
• minPTS: minimum occurences of samples within a cluster --> missing

Does anybody know, why the second Parameter ist missing?
I Don't get how this algorithm can be functional....

Highlighted

## Re: Splunk Machine Learning App / Toolkit - Using DBSCAN Clustering Algorithm

Legend

@hbrandt84, I concur, scikit learn also mentions two parameters i.e. `min_samples` and `eps` (http://scikit-learn.org/stable/modules/clustering.html#dbscan)

However, algorithm description and class detail mention that these parameters are optional:
http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html

Based on the following code for DBSCAN algorithm, I would expect that initialization default value is `min_samples=5` (https://github.com/scikit-learn/scikit-learn/blob/ab93d65/sklearn/cluster/dbscan_.py#L156):

``````def dbscan(X, eps=0.5, min_samples=5, metric='minkowski',
algorithm='auto', leaf_size=30, p=2, sample_weight=None, n_jobs=1):
``````

And:

``````def __init__(self, eps=0.5, min_samples=5, metric='euclidean',
algorithm='auto', leaf_size=30, p=None, n_jobs=1):
self.eps = eps
self.min_samples = min_samples
self.metric = metric
self.algorithm = algorithm
self.leaf_size = leaf_size
self.p = p
self.n_jobs = n_jobs
``````

However, this needs to be confirmed and possibly `enhanced in Machine Learning Toolkit to create a min_samples input parameter for DBSCAN`.

| eval message="Happy Splunking!!!"

Highlighted

## Re: Splunk Machine Learning App / Toolkit - Using DBSCAN Clustering Algorithm

Path Finder

You need to modify \$SPLUNKHOME/etc/apps/SplunkMLToolkit/bin/algos/DBSCAN.py file. In ```init_``` function replace string

``````out_params = convert_params(options.get('params', {}), floats=['eps'])
``````

with this one:

``````out_params = convert_params(options.get('params', {}), floats=['eps', 'min_samples'])
``````

After this you can write something like `fit DBSCAN eps=0.1 min_samples=2 *` in your SPL queries.