Hello Splunkers!
So I want to add an algorithm to Machine Learning Toolkit.
Is the Finished Example below the link, which is the CorrelationMatrix, is the algorithm for pearson, kendal, and spearman?
http://docs.splunk.com/Documentation/MLApp/2.2.0/API/CorrelationMatrix
Would like to validate this to you guys as I'm not sure if I'm understanding it right.
Much appreciated!
Hi Lloyd,
Since the correlations themselves are being calculated by panda's corr method:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.corr.html
We actually have options for which type of correlation we are calculating. In step four, we validate that the method the user provided is one of the three valid candidates provides by the documentation for the panda's corr method:
valid_methods = ['spearman', 'kendall', 'pearson']
# Check to see if parameters exist
params = options.get('params', {})
# Check if method is in parameters in search
if 'method' in params:
if params['method'] not in valid_methods:
error_msg = 'Invalid value for method: must be one of {}'.format(
', '.join(valid_methods))
raise RuntimeError(error_msg)
# Assign method to self for later usage
self.method = params['method']
^ this means that the user can use any of the following searches:
| fit CorrelationMatrix method=spearman <fieldlist>
or
| fit CorrelationMatrix method=kendall <fieldlist>
or
| fit CorrelationMatrix method=pearson <fieldlist>
Then, if you look a little further, you'll see that if no method was provided, we set the default to pearson:
# Assign default method & ensure no other parameters are present
else:
# Default method for correlation
self.method = 'pearson'
So, to answer your question, all three are available, and pearson correlation is the default in the example.
Hi Lloyd,
Since the correlations themselves are being calculated by panda's corr method:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.corr.html
We actually have options for which type of correlation we are calculating. In step four, we validate that the method the user provided is one of the three valid candidates provides by the documentation for the panda's corr method:
valid_methods = ['spearman', 'kendall', 'pearson']
# Check to see if parameters exist
params = options.get('params', {})
# Check if method is in parameters in search
if 'method' in params:
if params['method'] not in valid_methods:
error_msg = 'Invalid value for method: must be one of {}'.format(
', '.join(valid_methods))
raise RuntimeError(error_msg)
# Assign method to self for later usage
self.method = params['method']
^ this means that the user can use any of the following searches:
| fit CorrelationMatrix method=spearman <fieldlist>
or
| fit CorrelationMatrix method=kendall <fieldlist>
or
| fit CorrelationMatrix method=pearson <fieldlist>
Then, if you look a little further, you'll see that if no method was provided, we set the default to pearson:
# Assign default method & ensure no other parameters are present
else:
# Default method for correlation
self.method = 'pearson'
So, to answer your question, all three are available, and pearson correlation is the default in the example.
Thank you very much!