All Apps and Add-ons

Machine Learning Toolkit: fitting OneClassSVM algorithm

lradics
Path Finder

I'm trying to use the OneClassSVM algorithm (thank you, @cmerriman !) to detect outliers in the reactionTime field of my data. As best as I can tell from the information on scikit-learn.org, OneClassSVM is a novelty detection algorithm, meaning that when I use the "fit" command, it will determine a boundary that fits around most-if-not-all of the data I've given it, and deem those data points "normal." When I do so, however, 68% of my data ends up being marked "abnormal."

Here's the SPL I'm using:

index=xxx source="xxx" reactionTime user=lradics | where reactionTime < 10000 | where reactionTime > 300 | dedup ID | fit OneClassSVM reactionTime into rxn_time_model | table isNormal, reactionTime  

I don't have much experience with this sort of thing, so I'm suspecting it's probably a user error, but I can't find where I would've gone wrong. Is my understanding of the algorithm's behavior correct? Can anyone point me to what I should change?

Thank you!

0 Karma
1 Solution

cmerriman
Super Champion

the documentation for that algorithm is here with options:
https://docs.splunk.com/Documentation/MLApp/2.2.0/User/Algorithms#Anomaly_Detectors
you can set the kernel to be linear, poly, etc., the default is rbf (radial basis function - Gaussian) as well as the bound for training error (nu) and the default is 0.5.

Machine learning is a lot of practice and trial and error. Play with the options while you're fitting your training set until you see the results you want.

another useful doc is the cheatsheet.
http://docs.splunk.com/images/e/ee/MLTKCheatSheet.pdf

View solution in original post

0 Karma

niketn
Legend

@iradics, have you tried to adjust through other parameters for OneClassSVM?

List of parameters and following example is available in the documentation: http://docs.splunk.com/Documentation/MLApp/latest/User/Algorithms#OneClassSVM

 kernel="poly" nu=0.5 coef0=0.5 gamma=0.5 tol=1 degree=3 shrinking=f
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

cmerriman
Super Champion

the documentation for that algorithm is here with options:
https://docs.splunk.com/Documentation/MLApp/2.2.0/User/Algorithms#Anomaly_Detectors
you can set the kernel to be linear, poly, etc., the default is rbf (radial basis function - Gaussian) as well as the bound for training error (nu) and the default is 0.5.

Machine learning is a lot of practice and trial and error. Play with the options while you're fitting your training set until you see the results you want.

another useful doc is the cheatsheet.
http://docs.splunk.com/images/e/ee/MLTKCheatSheet.pdf

0 Karma

lradics
Path Finder

Thank you! I ended up switching the kernel to linear, and making nu much smaller (0.0001), and that worked. I'm curious why altering nu didn't affect the results I got with the default kernel... I'll read up on it some more 🙂

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...