<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Using the Splunk Machine Learning Toolkit, how do you predict rare events? in All Apps and Add-ons</title>
    <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Using-the-Splunk-Machine-Learning-Toolkit-how-do-you-predict/m-p/394556#M48165</link>
    <description>&lt;P&gt;You can use this as reference for adding class weight in your algo or you can use Github algo directly for your case: &lt;A href="https://github.com/splunk/mltk-algo-contrib/blob/master/src/bin/algos_contrib/CustomDecisionTreeClassifier.py"&gt;https://github.com/splunk/mltk-algo-contrib/blob/master/src/bin/algos_contrib/CustomDecisionTreeClassifier.py&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 15 Nov 2018 19:24:05 GMT</pubDate>
    <dc:creator>grana_splunk</dc:creator>
    <dc:date>2018-11-15T19:24:05Z</dc:date>
    <item>
      <title>Using the Splunk Machine Learning Toolkit, how do you predict rare events?</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Using-the-Splunk-Machine-Learning-Toolkit-how-do-you-predict/m-p/394553#M48162</link>
      <description>&lt;P&gt;Hello, &lt;/P&gt;

&lt;P&gt;I use the Splunk Machine Learning Toolkit. I would like to predict a rare event. The predicted variable has two values : "GOOD" and "BAD". The "BAD" only represents 13% of the data. &lt;/P&gt;

&lt;P&gt;I use RandomForestClassifier to do the prediction. But it has serious difficulty to predict the "BAD". The confusion matrix is : &lt;/P&gt;

&lt;P&gt;Predicted | Predicted GOOD | Predicted BAD |&lt;BR /&gt;
BAD      |  11.9% | 88.1% | &lt;BR /&gt;
GOOD | 19.4% | 80.6% | &lt;/P&gt;

&lt;P&gt;Of course, this model has great results with a precision of 0.87 and an F1 of 0.85 because, most of the time, the result is GOOD, but it doesn't work for the "BAD". &lt;/P&gt;

&lt;P&gt;How can I improve my model? Is it possible to use class_weight or other things like that ? &lt;/P&gt;

&lt;P&gt;Thank you in advance for your answer&lt;/P&gt;</description>
      <pubDate>Thu, 15 Nov 2018 16:22:30 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/Using-the-Splunk-Machine-Learning-Toolkit-how-do-you-predict/m-p/394553#M48162</guid>
      <dc:creator>marinelelievre</dc:creator>
      <dc:date>2018-11-15T16:22:30Z</dc:date>
    </item>
    <item>
      <title>Re: Using the Splunk Machine Learning Toolkit, how do you predict rare events?</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Using-the-Splunk-Machine-Learning-Toolkit-how-do-you-predict/m-p/394554#M48163</link>
      <description>&lt;P&gt;As per your comment your prediction accuracy for bad is low but in the table shared, it seems like for actual bad values your predicted bad accuracy is 88.1% which seems great and for prediction of good it's 19.4% accurate which is not a good prediction accuracy. &lt;BR /&gt;
Seems like table is inversed.&lt;/P&gt;

&lt;P&gt;Coming back to your problem of improving your accuracy of predicting bad , there are three options:&lt;/P&gt;

&lt;P&gt;1) Getting more data for bad cases, this would help the model understand those cases more. Also, its possible that the fields being used for prediction do not have a good relation with the target variable, including new variables for prediction could also help.&lt;BR /&gt;
2) Trying different algorithms. (Although RandomForestClassifier is a good one)&lt;BR /&gt;
3) Using MLSPL API &lt;A href="https://docs.splunk.com/Documentation/MLApp/4.0.0/API/Introduction"&gt;https://docs.splunk.com/Documentation/MLApp/4.0.0/API/Introduction&lt;/A&gt; , getting in the resampling algorithm into MLTK and using that to resample your data for Bad and Good to 50% each.&lt;BR /&gt;
Algorithm which can help: &lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;&lt;A href="https://scikit-learn.org/stable/modules/generated/sklearn.utils.resample.html"&gt;https://scikit-learn.org/stable/modules/generated/sklearn.utils.resample.html&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html"&gt;https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;Hope this helps.&lt;/P&gt;</description>
      <pubDate>Thu, 15 Nov 2018 17:48:45 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/Using-the-Splunk-Machine-Learning-Toolkit-how-do-you-predict/m-p/394554#M48163</guid>
      <dc:creator>hkeswani_splunk</dc:creator>
      <dc:date>2018-11-15T17:48:45Z</dc:date>
    </item>
    <item>
      <title>Re: Using the Splunk Machine Learning Toolkit, how do you predict rare events?</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Using-the-Splunk-Machine-Learning-Toolkit-how-do-you-predict/m-p/394555#M48164</link>
      <description>&lt;P&gt;Hi&lt;BR /&gt;
Class weight is something you can access by using the ML APIs and exposing that parameter in the code. &lt;A href="https://docs.splunk.com/Documentation/MLApp/4.0.0/API/Introduction"&gt;https://docs.splunk.com/Documentation/MLApp/4.0.0/API/Introduction&lt;/A&gt;&lt;BR /&gt;
Or you can change the events in your search, by sampling by class manually using SPL to balance the classes, and then using the |fit command on that balanced data.&lt;/P&gt;</description>
      <pubDate>Thu, 15 Nov 2018 18:07:55 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/Using-the-Splunk-Machine-Learning-Toolkit-how-do-you-predict/m-p/394555#M48164</guid>
      <dc:creator>astein_splunk</dc:creator>
      <dc:date>2018-11-15T18:07:55Z</dc:date>
    </item>
    <item>
      <title>Re: Using the Splunk Machine Learning Toolkit, how do you predict rare events?</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Using-the-Splunk-Machine-Learning-Toolkit-how-do-you-predict/m-p/394556#M48165</link>
      <description>&lt;P&gt;You can use this as reference for adding class weight in your algo or you can use Github algo directly for your case: &lt;A href="https://github.com/splunk/mltk-algo-contrib/blob/master/src/bin/algos_contrib/CustomDecisionTreeClassifier.py"&gt;https://github.com/splunk/mltk-algo-contrib/blob/master/src/bin/algos_contrib/CustomDecisionTreeClassifier.py&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 15 Nov 2018 19:24:05 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/Using-the-Splunk-Machine-Learning-Toolkit-how-do-you-predict/m-p/394556#M48165</guid>
      <dc:creator>grana_splunk</dc:creator>
      <dc:date>2018-11-15T19:24:05Z</dc:date>
    </item>
    <item>
      <title>Re: Using the Splunk Machine Learning Toolkit, how do you predict rare events?</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Using-the-Splunk-Machine-Learning-Toolkit-how-do-you-predict/m-p/394557#M48166</link>
      <description>&lt;P&gt;Thank you for your answer. But how can I do to sample class manually with SPL ?&lt;/P&gt;</description>
      <pubDate>Mon, 19 Nov 2018 09:29:24 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/Using-the-Splunk-Machine-Learning-Toolkit-how-do-you-predict/m-p/394557#M48166</guid>
      <dc:creator>marinelelievre</dc:creator>
      <dc:date>2018-11-19T09:29:24Z</dc:date>
    </item>
    <item>
      <title>Re: Using the Splunk Machine Learning Toolkit, how do you predict rare events?</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Using-the-Splunk-Machine-Learning-Toolkit-how-do-you-predict/m-p/394558#M48167</link>
      <description>&lt;P&gt;There are many ways depending on the type of sampling you wish to use.&lt;BR /&gt;
&lt;A href="https://docs.splunk.com/Documentation/MLApp/4.0.0/User/Customsearchcommands#sample" target="_blank"&gt;https://docs.splunk.com/Documentation/MLApp/4.0.0/User/Customsearchcommands#sample&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;.. | search fieldforclass="class_label_A" | sample partitions=100 seed=1001 | where partition_number&amp;lt;=70 | outputlookup class_label_A.csv&lt;/P&gt;

&lt;P&gt;.. | search fieldforclass="class_label_B" | sample partitions=100 seed=1001 | where partition_number&amp;lt;=70 | outputlookup class_label_B.csv&lt;/P&gt;

&lt;P&gt;combine the two like so&lt;BR /&gt;
| inputlookup class_label_A.csv | append[ inputlookup class_label_B.csv ]&lt;/P&gt;

&lt;P&gt;Note that there are far more performant options if you use summary indexes or maybe even use the proportional option on the sample command itself. &lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 22:05:14 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/Using-the-Splunk-Machine-Learning-Toolkit-how-do-you-predict/m-p/394558#M48167</guid>
      <dc:creator>astein_splunk</dc:creator>
      <dc:date>2020-09-29T22:05:14Z</dc:date>
    </item>
    <item>
      <title>Re: Using the Splunk Machine Learning Toolkit, how do you predict rare events?</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Using-the-Splunk-Machine-Learning-Toolkit-how-do-you-predict/m-p/394559#M48168</link>
      <description>&lt;P&gt;Thank you for your answer. I try to use the algorithm after register it, create the python script file and add the Github algo. &lt;BR /&gt;
But when I do the following search : &lt;BR /&gt;
... | fit CustomDecisionTreeClassifier splitter=best criterion=gini class_weight="{'GOOD':7,'BAD':1}" "explained_variable" from "explanatory_variable_1" "explanatory_variable_2" "explanatory_variable_3" .... into "test" as prediction&lt;BR /&gt;
I have an error : &lt;STRONG&gt;Error in 'fit' command: Error while saving model "test": Not JSON serializable: algos.CustomDecisionTreeClassifier.CustomDecisionTreeClassifier&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;I think it's because of the SimpleObjectCodec, but I don't really know how to fix it. &lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 22:06:08 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/Using-the-Splunk-Machine-Learning-Toolkit-how-do-you-predict/m-p/394559#M48168</guid>
      <dc:creator>marinelelievre</dc:creator>
      <dc:date>2020-09-29T22:06:08Z</dc:date>
    </item>
    <item>
      <title>Re: Using the Splunk Machine Learning Toolkit, how do you predict rare events?</title>
      <link>https://community.splunk.com/t5/All-Apps-and-Add-ons/Using-the-Splunk-Machine-Learning-Toolkit-how-do-you-predict/m-p/394560#M48169</link>
      <description>&lt;P&gt;No its not SimpleObjectCodec problem. I know the reason behind it. Let me explain you&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;When you are using GitHub algos, you can use it as an app . Instruction has been given in the readme file&lt;/LI&gt;
&lt;LI&gt;If you want to use it inside MLTK by copying the algo in Toolkit, please do the following&lt;/LI&gt;
&lt;LI&gt;Copy the algo file to $SPLUNK_HOME/etc/apps/Splunk_ML_Toolkit/bin/algos&lt;/LI&gt;
&lt;LI&gt;Open the file and change line 64 as shown below&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;Change : codecs_manager.add_codec('algos_contrib.CustomDecisionTreeClassifier', 'CustomDecisionTreeClassifier', SimpleObjectCodec)&lt;BR /&gt;
To : codecs_manager.add_codec('algos.CustomDecisionTreeClassifier', 'CustomDecisionTreeClassifier', SimpleObjectCodec)&lt;/P&gt;

&lt;P&gt;i.e you are replacing "algos_contrib" to "algos"&lt;/P&gt;

&lt;P&gt;-Make sure to register your algorithm under algos.conf&lt;/P&gt;

&lt;P&gt;-restart splunk and it will work for you &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;Here is the syntax to use class_weight&lt;BR /&gt;
  | fit DecisionTreeClassifier class_weight="{'Yes':1,'No':0.1}"&lt;/P&gt;</description>
      <pubDate>Tue, 29 Sep 2020 22:06:28 GMT</pubDate>
      <guid>https://community.splunk.com/t5/All-Apps-and-Add-ons/Using-the-Splunk-Machine-Learning-Toolkit-how-do-you-predict/m-p/394560#M48169</guid>
      <dc:creator>grana_splunk</dc:creator>
      <dc:date>2020-09-29T22:06:28Z</dc:date>
    </item>
  </channel>
</rss>

