I'm trying to use the OneClassSVM algorithm (thank you, @cmerriman !) to detect outliers in the
reactionTime field of my data. As best as I can tell from the information on scikit-learn.org, OneClassSVM is a novelty detection algorithm, meaning that when I use the "fit" command, it will determine a boundary that fits around most-if-not-all of the data I've given it, and deem those data points "normal." When I do so, however, 68% of my data ends up being marked "abnormal."
Here's the SPL I'm using:
index=xxx source="xxx" reactionTime user=lradics | where reactionTime < 10000 | where reactionTime > 300 | dedup ID | fit OneClassSVM reactionTime into rxn_time_model | table isNormal, reactionTime
I don't have much experience with this sort of thing, so I'm suspecting it's probably a user error, but I can't find where I would've gone wrong. Is my understanding of the algorithm's behavior correct? Can anyone point me to what I should change?
the documentation for that algorithm is here with options:
you can set the kernel to be linear, poly, etc., the default is rbf (radial basis function - Gaussian) as well as the bound for training error (nu) and the default is 0.5.
Machine learning is a lot of practice and trial and error. Play with the options while you're fitting your training set until you see the results you want.
another useful doc is the cheatsheet.
Thank you! I ended up switching the kernel to linear, and making nu much smaller (0.0001), and that worked. I'm curious why altering nu didn't affect the results I got with the default kernel... I'll read up on it some more 🙂
@iradics, have you tried to adjust through other parameters for OneClassSVM?
List of parameters and following example is available in the documentation: http://docs.splunk.com/Documentation/MLApp/latest/User/Algorithms#OneClassSVM
kernel="poly" nu=0.5 coef0=0.5 gamma=0.5 tol=1 degree=3 shrinking=f