From: Jari Oksanen <jarioksa_at_sun3.oulu.fi>

Date: Wed 06 Apr 2005 - 16:54:06 EST

Date: Wed 06 Apr 2005 - 16:54:06 EST

On Tue, 2005-04-05 at 22:54 -0400, John Sorkin wrote:

> Please forgive a straight stats question, and the informal notation.

*>
**> let us say we wish to perform a liner regression:
**> y=b0 + b1*x + b2*z
**>
**> There are two ways this can be done, the usual way, as a single
**> regression,
**> fit1<-lm(y~x+z)
**> or by doing two regressions. In the first regression we could have y as
**> the dependent variable and x as the independent variable
**> fit2<-lm(y~x).
**> The second regrssion would be a regression in which the residuals from
**> the first regression would be the depdendent variable, and the
**> independent variable would be z.
**> fit2<-lm(fit2$residuals~z)
**>
**> I would think the two methods would give the same p value and the same
**> beta coefficient for z. The don't. Can someone help my understand why
**> the two methods do not give the same results. Additionally, could
**> someone tell me when one method might be better than the other, i.e.
**> what question does the first method anwser, and what question does the
**> second method answer. I have searched a number of textbooks and have not
**> found this question addressed.
**>
*

John,

Bill Venables already told you that they don't do that, because they are not orthogonal. Here is a simpler way of getting the same result as he suggested for the coefficients of z (but only for z):

*> x <- runif(100)
**> z <- x + rnorm(100, sd=0.4)
**> y <- 3 + x + z + rnorm(100, sd=0.3)
**> mod <- lm(y ~ x + z)
*

> mod2 <- lm(residuals(lm(y ~ x)) ~ x + z)

> summary(mod)

Call:

lm(formula = y ~ x + z)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 2.96436 0.06070 48.836 < 2e-16 *** x 0.96272 0.11576 8.317 5.67e-13 *** z 1.08922 0.06711 16.229 < 2e-16 ***

--- Residual standard error: 0.2978 on 97 degrees of freedomReceived on Wed Apr 06 17:04:10 2005

> summary(mod2)

Call: lm(formula = residuals(lm(y ~ x)) ~ x + z) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.15731 0.06070 -2.592 0.0110 * x -0.84459 0.11576 -7.296 8.13e-11 *** z 1.08922 0.06711 16.229 < 2e-16 *** --- Residual standard error: 0.2978 on 97 degrees of freedom You can omit x from the outer lm only if x and z are orthogonal, although you already removed the effect of x... In orthogonal case the coefficient for x would be 0. Residuals are equal in these two models:

> range(residuals(mod) - residuals(mod2))

[1] -2.797242e-17 5.551115e-17 But, of course, fitted values are not equal, since you fit the mod2 to the residuals after removing the effect of x... cheers, jari oksanen -- Jari Oksanen <jarioksa@sun3.oulu.fi> ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

*
This archive was generated by hypermail 2.1.8
: Fri 03 Mar 2006 - 03:31:02 EST
*