All Apps and Add-ons

R squared negative

diasna1
New Member

During the development of a project, we faced some problems concerning the output of correlation coefficient R^2, when using the Splunk Machine Learning Toolkit version 3.1.0.

About the results concerning the regression coefficient, when evaluating the performance of a fitted time series, as described here below, we obtain a negative R^2.
alt text

If we consider the R^2 as:

R^2=SSR/SST

where
SSR=sum_i (y_i-hat{y_i})^2 stands for Regression Sum of Squares and
SST=sum_i (y_i-overline{y})^2 for Total Sum of Squares,
it’s impossible to obtain negative values of it. But we are not sure how the algorithm calculates it.
We suspect that it’s being considered the adjusted R^2, since this measure may have a negative value in some particular situations. However, we are not sure of it and how do they extend the result to time series. Also, we haven’t found any information about it so far.
We hope that somebody can help us figure out what’s going on 🙂
All the best with your Splunk queries and work!

0 Karma

aljohnson_splun
Splunk Employee
Splunk Employee

Hello @diasna1 - here is how we actually calculate the R^2 value as of MLTK 3.2.0:

| rename "$a$" as _actual, "$p$" as _predicted 
| eventstats avg(_actual) as _avgActual 
| eval _actualMinusAvg = _actual - _avgActual, _residual = _actual - _predicted 
| stats sumsq(_actualMinusAvg) as _sumsqActualMinusAvg, sumsq(_residual) as _sumsqResidual, count(_residual) as _sampleCount 
| eval rSquared = round(1 - _sumsqResidual / _sumsqActualMinusAvg, 4), RMSE = round(sqrt(_sumsqResidual / _sampleCount), 2) 
| table rSquared RMSE

alt text

Please feel free to review the similar definition on wikipedia, and let me know if I made a mistake.

https://en.wikipedia.org/wiki/Coefficient_of_determination#Definitions

So to answer your question directly, R^2 can ben negative whenever SSR is greater than SST.

Additionally, you may find this cross validated question & answer relevant: https://stats.stackexchange.com/a/12991/78566

0 Karma

walkerhound
Path Finder

When you hover to see what the R squared means, it says "The square of the correlation coefficient..." This would seem to imply that R squared would be positive. Should the information under the hover be changed?

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...