From: Greg Snow <Greg.Snow_at_imail.org>

Date: Thu, 17 Mar 2011 11:46:50 -0600

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 17 Mar 2011 - 18:01:14 GMT

Date: Thu, 17 Mar 2011 11:46:50 -0600

It is all a matter of what you are comparing too, or what the null model is. For most cases (standard regression) we compare a model with slope and intercept to an intercept only model (looking at the effect of the slope), the intercept only model fits a horizontal line through the mean of the y's hence the subtraction of the mean. If we don't do that then R-squared can easily become meaningless. Here is an example where we compute the r-squared using the no-intercept formula:

x <- rnorm(100, 1000, 20)

y <- rnorm(100, 1000, 20)

cor(x,y)

summary( lm( y ~ rep(1,100) + x + 0 ) )

Notice how big the r-squared value is (and that it is not anywhere near the square of the correlation) for data that is pretty independent.

When you force the intercept to 0, then you are using a different null model (mean 0). Part of Thomas's point was that if we still subtract the mean in this case then the calculation of r-squared can give a negative number, which you pointed out is meaningless, the gist is that that is the incorrect formula to use and so R instead uses the formula without subtracting the mean when you don't fit an intercept.

The reason the r-squared values are different is because they are using different denominators and are therefore not comparable.

The reason that R uses 2 different formulas/denominators is because there is not one single formula/denominator that makes general sense in both cases.

Hope this helps,

--

Gregory (Greg) L. Snow Ph.D.

Statistical Data Center

Intermountain Healthcare

greg.snow_at_imail.org

801.408.8111

> -----Original Message-----

*> From: r-help-bounces_at_r-project.org [mailto:r-help-bounces_at_r-
**> project.org] On Behalf Of derek
**> Sent: Thursday, March 17, 2011 9:29 AM
**> To: r-help_at_r-project.org
**> Subject: Re: [R] Strange R squared, possible error
**>
**> Thats exactly what I would like to do. Any idea on good text? I've
**> consulted
**> severel texts, but no one defined R^2 as R^2 = 1 - Sum(R[i]^2) /
**> Sum((y[i])^2-y*)) still less why to use different formulas for similar
**> model
**> or why should be R^2 closer to 1 when y=a*x+0 than in general model
**> y=a*x+b.
**>
**> from manual:
**> r.squared R^2, the â€˜fraction of variance explained by the modelâ€™,
**> R^2 = 1 - Sum(R[i]^2) / Sum((y[i]- y*)^2),
**> where y* is the mean of y[i] "if there is an intercept" and zero
**> otherwise.
**>
**> I don't need explaining what R^2 does nor how to interpret it, because
**> I
**> know what it means and how it is derived. I don't need to be told which
**> model I should apply. So the answers from Thomas weren't helpful.
**>
**> I don't claim it is wrong, otherwise wouldn't be employed, but I want
**> to see
**> the reason behind using two formulas.
**>
**> Control questions:
**> 1) Statement "if there is an intercept" means intercept including zero
**> intercept?
**>
**> 2) If I use model y = a*x+0 which formula for R^2 is used: the one with
**> Y*
**> or the one without?
**>
**> --
**> View this message in context: http://r.789695.n4.nabble.com/Strange-R-
**> squared-possible-error-tp3382818p3384844.html
**> Sent from the R help mailing list archive at Nabble.com.
**>
**> ______________________________________________
**> R-help_at_r-project.org mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide http://www.R-project.org/posting-
**> guide.html
**> and provide commented, minimal, self-contained, reproducible code.
*

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 17 Mar 2011 - 18:01:14 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Thu 17 Mar 2011 - 23:50:22 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*