All Apps and Add-ons

Machine Learning Toolkit: fitting OneClassSVM algorithm

lradics
Path Finder

I'm trying to use the OneClassSVM algorithm (thank you, @cmerriman !) to detect outliers in the reactionTime field of my data. As best as I can tell from the information on scikit-learn.org, OneClassSVM is a novelty detection algorithm, meaning that when I use the "fit" command, it will determine a boundary that fits around most-if-not-all of the data I've given it, and deem those data points "normal." When I do so, however, 68% of my data ends up being marked "abnormal."

Here's the SPL I'm using:

index=xxx source="xxx" reactionTime user=lradics | where reactionTime < 10000 | where reactionTime > 300 | dedup ID | fit OneClassSVM reactionTime into rxn_time_model | table isNormal, reactionTime  

I don't have much experience with this sort of thing, so I'm suspecting it's probably a user error, but I can't find where I would've gone wrong. Is my understanding of the algorithm's behavior correct? Can anyone point me to what I should change?

Thank you!

0 Karma
1 Solution

cmerriman
Super Champion

the documentation for that algorithm is here with options:
https://docs.splunk.com/Documentation/MLApp/2.2.0/User/Algorithms#Anomaly_Detectors
you can set the kernel to be linear, poly, etc., the default is rbf (radial basis function - Gaussian) as well as the bound for training error (nu) and the default is 0.5.

Machine learning is a lot of practice and trial and error. Play with the options while you're fitting your training set until you see the results you want.

another useful doc is the cheatsheet.
http://docs.splunk.com/images/e/ee/MLTKCheatSheet.pdf

View solution in original post

0 Karma

niketn
Legend

@iradics, have you tried to adjust through other parameters for OneClassSVM?

List of parameters and following example is available in the documentation: http://docs.splunk.com/Documentation/MLApp/latest/User/Algorithms#OneClassSVM

 kernel="poly" nu=0.5 coef0=0.5 gamma=0.5 tol=1 degree=3 shrinking=f
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

cmerriman
Super Champion

the documentation for that algorithm is here with options:
https://docs.splunk.com/Documentation/MLApp/2.2.0/User/Algorithms#Anomaly_Detectors
you can set the kernel to be linear, poly, etc., the default is rbf (radial basis function - Gaussian) as well as the bound for training error (nu) and the default is 0.5.

Machine learning is a lot of practice and trial and error. Play with the options while you're fitting your training set until you see the results you want.

another useful doc is the cheatsheet.
http://docs.splunk.com/images/e/ee/MLTKCheatSheet.pdf

0 Karma

lradics
Path Finder

Thank you! I ended up switching the kernel to linear, and making nu much smaller (0.0001), and that worked. I'm curious why altering nu didn't affect the results I got with the default kernel... I'll read up on it some more 🙂

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...