Hi Splunk,
I work for a corporate partner and am interested in the capabilities of your new Machine Learning Toolkit.
I wrote several Python scripts using the Splunk SDK for Python to do the following, but desire the capability to do this directly from Splunk via the Machine Learning toolkit or a dashboard:
(1) I want to consider all pair-wise event field differences for N events of M fields considered. This would result in ~ N^2/2 vectors (v_2,1 v_3,2 v_3,1 …) each of length M (N^2 vectors would be redundant as v_j,k = v_k,j).
(2) From these event field differences, I perform binary classification using a linear weight w determine via some linear model such as LDA or logistic regression, i.e. If ( [v_j,k] * [w_1 … w_M].T < Some threshold ) => Events j and k are similar
(3) I then perform single-link clustering for all event fields deemed similar.
As the ML toolkit implements clustering, I suppose that adaptions to the existing source code would allow one to do this, but would like to know if there is an easier way.
Blake
... View more