All Apps and Add-ons

Machine Learning Toolkit: fitting OneClassSVM algorithm

lradics
Path Finder

I'm trying to use the OneClassSVM algorithm (thank you, @cmerriman !) to detect outliers in the reactionTime field of my data. As best as I can tell from the information on scikit-learn.org, OneClassSVM is a novelty detection algorithm, meaning that when I use the "fit" command, it will determine a boundary that fits around most-if-not-all of the data I've given it, and deem those data points "normal." When I do so, however, 68% of my data ends up being marked "abnormal."

Here's the SPL I'm using:

index=xxx source="xxx" reactionTime user=lradics | where reactionTime < 10000 | where reactionTime > 300 | dedup ID | fit OneClassSVM reactionTime into rxn_time_model | table isNormal, reactionTime  

I don't have much experience with this sort of thing, so I'm suspecting it's probably a user error, but I can't find where I would've gone wrong. Is my understanding of the algorithm's behavior correct? Can anyone point me to what I should change?

Thank you!

0 Karma
1 Solution

cmerriman
Super Champion

the documentation for that algorithm is here with options:
https://docs.splunk.com/Documentation/MLApp/2.2.0/User/Algorithms#Anomaly_Detectors
you can set the kernel to be linear, poly, etc., the default is rbf (radial basis function - Gaussian) as well as the bound for training error (nu) and the default is 0.5.

Machine learning is a lot of practice and trial and error. Play with the options while you're fitting your training set until you see the results you want.

another useful doc is the cheatsheet.
http://docs.splunk.com/images/e/ee/MLTKCheatSheet.pdf

View solution in original post

0 Karma

niketn
Legend

@iradics, have you tried to adjust through other parameters for OneClassSVM?

List of parameters and following example is available in the documentation: http://docs.splunk.com/Documentation/MLApp/latest/User/Algorithms#OneClassSVM

 kernel="poly" nu=0.5 coef0=0.5 gamma=0.5 tol=1 degree=3 shrinking=f
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

cmerriman
Super Champion

the documentation for that algorithm is here with options:
https://docs.splunk.com/Documentation/MLApp/2.2.0/User/Algorithms#Anomaly_Detectors
you can set the kernel to be linear, poly, etc., the default is rbf (radial basis function - Gaussian) as well as the bound for training error (nu) and the default is 0.5.

Machine learning is a lot of practice and trial and error. Play with the options while you're fitting your training set until you see the results you want.

another useful doc is the cheatsheet.
http://docs.splunk.com/images/e/ee/MLTKCheatSheet.pdf

0 Karma

lradics
Path Finder

Thank you! I ended up switching the kernel to linear, and making nu much smaller (0.0001), and that worked. I'm curious why altering nu didn't affect the results I got with the default kernel... I'll read up on it some more 🙂

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...