Solved: Machine Learning Toolkit: fitting OneClassSVM algo...

lradics · ‎07-12-2017

I'm trying to use the OneClassSVM algorithm (thank you, @cmerriman !) to detect outliers in the reactionTime field of my data. As best as I can tell from the information on scikit-learn.org, OneClassSVM is a novelty detection algorithm, meaning that when I use the "fit" command, it will determine a boundary that fits around most-if-not-all of the data I've given it, and deem those data points "normal." When I do so, however, 68% of my data ends up being marked "abnormal."

Here's the SPL I'm using:

index=xxx source="xxx" reactionTime user=lradics | where reactionTime < 10000 | where reactionTime > 300 | dedup ID | fit OneClassSVM reactionTime into rxn_time_model | table isNormal, reactionTime

I don't have much experience with this sort of thing, so I'm suspecting it's probably a user error, but I can't find where I would've gone wrong. Is my understanding of the algorithm's behavior correct? Can anyone point me to what I should change?

Thank you!

cmerriman · ‎07-12-2017

the documentation for that algorithm is here with options:
https://docs.splunk.com/Documentation/MLApp/2.2.0/User/Algorithms#Anomaly_Detectors
you can set the kernel to be linear, poly, etc., the default is rbf (radial basis function - Gaussian) as well as the bound for training error (nu) and the default is 0.5.

Machine learning is a lot of practice and trial and error. Play with the options while you're fitting your training set until you see the results you want.

another useful doc is the cheatsheet.
http://docs.splunk.com/images/e/ee/MLTKCheatSheet.pdf

View solution in original post

niketn · ‎07-12-2017

@iradics, have you tried to adjust through other parameters for OneClassSVM?

List of parameters and following example is available in the documentation: http://docs.splunk.com/Documentation/MLApp/latest/User/Algorithms#OneClassSVM

 kernel="poly" nu=0.5 coef0=0.5 gamma=0.5 tol=1 degree=3 shrinking=f

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

cmerriman · ‎07-12-2017

the documentation for that algorithm is here with options:
https://docs.splunk.com/Documentation/MLApp/2.2.0/User/Algorithms#Anomaly_Detectors
you can set the kernel to be linear, poly, etc., the default is rbf (radial basis function - Gaussian) as well as the bound for training error (nu) and the default is 0.5.

Machine learning is a lot of practice and trial and error. Play with the options while you're fitting your training set until you see the results you want.

another useful doc is the cheatsheet.
http://docs.splunk.com/images/e/ee/MLTKCheatSheet.pdf

lradics · ‎07-12-2017

Thank you! I ended up switching the kernel to linear, and making nu much smaller (0.0001), and that worked. I'm curious why altering nu didn't affect the results I got with the default kernel... I'll read up on it some more 🙂

Machine Learning Toolkit: fitting OneClassSVM algorithm

Updated Data Type Articles, Anniversary Celebrations, and More on Splunk Lantern

A Prelude to .conf25: Your Guide to Splunk University

4 Ways the Splunk Community Helps You Prepare for .conf25

Are you a member of the Splunk Community?

Machine Learning Toolkit: fitting OneClassSVM algorithm

Updated Data Type Articles, Anniversary Celebrations, and More on Splunk Lantern

A Prelude to .conf25: Your Guide to Splunk University

4 Ways the Splunk Community Helps You Prepare for .conf25