Re: [R] R vs. Excel (R-squared)

From: Douglas Bates <dmbates_at_gmail.com>
Date: Wed 25 Jan 2006 - 07:25:36 EST

On 1/24/06, Lance Westerhoff <lance@quantumbioinc.com> wrote:
>
> Hi-
>
> On Jan 24, 2006, at 12:08 PM, Peter Dalgaard wrote:
>
> > Lance Westerhoff <lance@quantumbioinc.com> writes:
> >
> >> Hello All-
> >>
> >> I found an inconsistency between the R-squared reported in Excel vs.
> >> that in R, and I am wondering which (if any) may be correct and if
> >> this is a known issue. While it certainly wouldn't surprise me if
> >> Excel is just flat out wrong, I just want to make sure since the R-
> >> squared reported in R seems surprisingly high. Please let me know if
> >> this is the wrong list. Thanks!
> >
> > Excel is flat out wrong. As the name implies, R-squared values cannot
> > be less than zero (adjusted R-squared can, but I wouldn't think
> > that is what Excel does).
>
> I had thought the same thing, but then I came across the following
> site which states: "Note that it is possible to get a negative R-
> square for equations that do not contain a constant term. If R-square
> is defined as the proportion of variance explained by the fit, and if
> the fit is actually worse than just fitting a horizontal line, then R-
> square is negative. In this case, R-square cannot be interpreted as
> the square of a correlation." Since
>
> R^2 = 1 - (SSE/SST)
>
> I guess you can have SSE > SST which would result in a R^2 of less
> then 1.0. However, it still seems very strange which made me wonder
> what is going on in Excel needless to say!
>
> http://www.mathworks.com/access/helpdesk/help/toolbox/curvefit/
> ch_fitt9.html

This seems to be a case of using the wrong formula. R^2 should measure the amount of variation for which the given model accounts relative to the amount of variation for which the *appropriate* null model does not account. If you have a constant or intercept term in a linear model then the null model for comparison is one with the intercept only. If you have a linear model without an intercept term then the appropriate null model for comparison is the model that predicts all the responses as zero. Thus SST, the "corrected" total sum of squares, should be used when you have a model with an intercept term but the uncorrected total sum of squares should be used when you do not have an intercept term.

It is disappointing to see the MathWorks propagating such an elementary misconception.



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed Jan 25 07:44:35 2006

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:42:10 EST