RE: [R] two methods for regression, two different results

From: <Bill.Venables_at_csiro.au>
Date: Wed 06 Apr 2005 - 13:25:58 EST


This is possible if x and z are orthogonal, but in general it doesn't work as you have noted. (If it did work it would almost amount to a way of inverting geenral square matrices by working one row at a time, no going back...)

It is possible to fit a bivariate regression using simple linear regression techniques iteratively like this, but it is a bit more involved than your two step process.

  1. regress y on x and take the residuals: ryx <- resid(lm(y ~ x))
  2. regress z on x and take the residuals: rzx <- resid(lm(z ~ x))
  3. regress ryx on rzx: fitz <- lm(ryx ~ rzx)
  4. this gives you the estimate of the coefficient on z (what you call below b2): b2 <- coef(fitz)[2]
  5. regress y - b2*z on x: fitx <- lm(I(y - b2*z) ~ x)

This last step gets you the estimates of b0 and b1.

None of this works with significances, though, because in going about it this way you have essentially disguised the degrees of freedom involved. So you can get the right estimates, but the standard errors, t-statistics and residual variances are all somewhat inaccurate (though usually not by much).

If x and z are orthogonal the (curious looking) step 2 is not needed.

This kind of idea lies behind some algorithms (e.g. Stevens' algorithm) for fitting very large regressions essentially by iterative processes to avoid constructing a huge model matrix.

Bill Venables

-----Original Message-----
From: r-help-bounces@stat.math.ethz.ch
[mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of John Sorkin Sent: Wednesday, 6 April 2005 12:55 PM
To: r-help@stat.math.ethz.ch
Subject: [R] two methods for regression, two different results

Please forgive a straight stats question, and the informal notation.  

let us say we wish to perform a liner regression: y=b0 + b1*x + b2*z  

There are two ways this can be done, the usual way, as a single regression,
fit1<-lm(y~x+z)
or by doing two regressions. In the first regression we could have y as the dependent variable and x as the independent variable fit2<-lm(y~x).
The second regrssion would be a regression in which the residuals from the first regression would be the depdendent variable, and the independent variable would be z.

fit2<-lm(fit2$residuals~z)  

I would think the two methods would give the same p value and the same beta coefficient for z. The don't. Can someone help my understand why the two methods do not give the same results. Additionally, could someone tell me when one method might be better than the other, i.e. what question does the first method anwser, and what question does the second method answer. I have searched a number of textbooks and have not found this question addressed.  

Thanks,
John  

John Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
Baltimore VA Medical Center GRECC and
University of Maryland School of Medicine Claude Pepper OAIC  

University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524  

410-605-7119
-- NOTE NEW EMAIL ADDRESS:
jsorkin@grecc.umaryland.edu

        [[alternative HTML version deleted]]



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed Apr 06 13:30:42 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:31:02 EST