<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Splunk Machine Learning App / Toolkit - Using DBSCAN Clustering Algorithm in All Apps and Add-ons</title>
    <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Splunk-Machine-Learning-App-Toolkit-Using-DBSCAN-Clustering/m-p/329482#M39468</link>
    <description>&lt;P&gt;You need to modify $SPLUNK_HOME/etc/apps/Splunk_ML_Toolkit/bin/algos/DBSCAN.py file. In  &lt;CODE&gt;__init__&lt;/CODE&gt; function replace string&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;out_params = convert_params(options.get('params', {}), floats=['eps'])
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;with this one:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;out_params = convert_params(options.get('params', {}), floats=['eps', 'min_samples'])
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;After this you can write something like  &lt;CODE&gt;fit DBSCAN eps=0.1 min_samples=2 *&lt;/CODE&gt; in your SPL queries.&lt;/P&gt;</description>
    <pubDate>Tue, 29 Sep 2020 16:48:08 GMT</pubDate>
    <dc:creator>nryabykh</dc:creator>
    <dc:date>2020-09-29T16:48:08Z</dc:date>
    <item>
      <title>Splunk Machine Learning App / Toolkit - Using DBSCAN Clustering Algorithm</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Splunk-Machine-Learning-App-Toolkit-Using-DBSCAN-Clustering/m-p/329480#M39466</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;I want to use the Clustering Algorithm "DBSCAN" from the Machine Learning Toolkit.&lt;BR /&gt;
(&lt;A href="https://docs.splunk.com/Documentation/MLApp/2.3.0/User/Algorithms"&gt;https://docs.splunk.com/Documentation/MLApp/2.3.0/User/Algorithms&lt;/A&gt;) --&amp;gt; listed under "clustering algorithms"&lt;/P&gt;

&lt;P&gt;Now, upon implementation, I noticed, that this algorithm only needs one parameter: EPS&lt;BR /&gt;
(maximum distance between two samples for them to be considered in the same cluster)&lt;/P&gt;

&lt;P&gt;Now if you look up any definition of the DBSCAN Algorithm, for example...&lt;BR /&gt;
(&lt;A href="https://en.wikipedia.org/wiki/DBSCAN"&gt;https://en.wikipedia.org/wiki/DBSCAN&lt;/A&gt;)&lt;BR /&gt;
...you will notice that a DBSCAN algorithm will need 2 Parameters to be functional:&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;EPS (Epsilon): maximum distance between two samples --&amp;gt; provided&lt;/LI&gt;
&lt;LI&gt;minPTS: minimum occurences of samples within a cluster --&amp;gt; missing&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;Does anybody know, why the second Parameter ist missing?&lt;BR /&gt;
I Don't get how this algorithm can be functional....&lt;/P&gt;</description>
      <pubDate>Tue, 25 Jul 2017 10:17:42 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/Splunk-Machine-Learning-App-Toolkit-Using-DBSCAN-Clustering/m-p/329480#M39466</guid>
      <dc:creator>hbrandt84</dc:creator>
      <dc:date>2017-07-25T10:17:42Z</dc:date>
    </item>
    <item>
      <title>Re: Splunk Machine Learning App / Toolkit - Using DBSCAN Clustering Algorithm</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Splunk-Machine-Learning-App-Toolkit-Using-DBSCAN-Clustering/m-p/329481#M39467</link>
      <description>&lt;P&gt;@hbrandt84, I concur, scikit learn also mentions two parameters i.e. &lt;CODE&gt;min_samples&lt;/CODE&gt; and &lt;CODE&gt;eps&lt;/CODE&gt; (&lt;A href="http://scikit-learn.org/stable/modules/clustering.html#dbscan"&gt;http://scikit-learn.org/stable/modules/clustering.html#dbscan&lt;/A&gt;)&lt;/P&gt;

&lt;P&gt;However, algorithm description and class detail mention that these parameters are optional:&lt;BR /&gt;
&lt;A href="http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html"&gt;http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Based on the following code for DBSCAN algorithm, I would expect that initialization default value is &lt;CODE&gt;min_samples=5&lt;/CODE&gt; (&lt;A href="https://github.com/scikit-learn/scikit-learn/blob/ab93d65/sklearn/cluster/dbscan_.py#L156):"&gt;https://github.com/scikit-learn/scikit-learn/blob/ab93d65/sklearn/cluster/dbscan_.py#L156):&lt;/A&gt;&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;def dbscan(X, eps=0.5, min_samples=5, metric='minkowski',
           algorithm='auto', leaf_size=30, p=2, sample_weight=None, n_jobs=1):
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;And:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;def __init__(self, eps=0.5, min_samples=5, metric='euclidean',
             algorithm='auto', leaf_size=30, p=None, n_jobs=1):
    self.eps = eps
    self.min_samples = min_samples
    self.metric = metric
    self.algorithm = algorithm
    self.leaf_size = leaf_size
    self.p = p
    self.n_jobs = n_jobs
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;However, this needs to be confirmed and possibly &lt;CODE&gt;enhanced in Machine Learning Toolkit to create a min_samples input parameter for DBSCAN&lt;/CODE&gt;.&lt;/P&gt;</description>
      <pubDate>Tue, 25 Jul 2017 18:31:52 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/Splunk-Machine-Learning-App-Toolkit-Using-DBSCAN-Clustering/m-p/329481#M39467</guid>
      <dc:creator>niketn</dc:creator>
      <dc:date>2017-07-25T18:31:52Z</dc:date>
    </item>
    <item>
      <title>Re: Splunk Machine Learning App / Toolkit - Using DBSCAN Clustering Algorithm</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Splunk-Machine-Learning-App-Toolkit-Using-DBSCAN-Clustering/m-p/329482#M39468</link>
      <description>&lt;P&gt;You need to modify $SPLUNK_HOME/etc/apps/Splunk_ML_Toolkit/bin/algos/DBSCAN.py file. In  &lt;CODE&gt;__init__&lt;/CODE&gt; function replace string&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;out_params = convert_params(options.get('params', {}), floats=['eps'])
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;with this one:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;out_params = convert_params(options.get('params', {}), floats=['eps', 'min_samples'])
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;After this you can write something like  &lt;CODE&gt;fit DBSCAN eps=0.1 min_samples=2 *&lt;/CODE&gt; in your SPL queries.&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 16:48:08 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/Splunk-Machine-Learning-App-Toolkit-Using-DBSCAN-Clustering/m-p/329482#M39468</guid>
      <dc:creator>nryabykh</dc:creator>
      <dc:date>2020-09-29T16:48:08Z</dc:date>
    </item>
  </channel>
</rss>

