From: Bert Gunter <gunter.berton_at_gene.com>

Date: Wed 03 Jan 2007 - 22:19:39 GMT

Date: Wed 03 Jan 2007 - 22:19:39 GMT

Ravi:

You misinterpreted my reply -- perhaps I was unclear. I did **not** say that lm() with a matrix response would do it, but that the apply construction or an explicit loop would. As you and the poster noted, lm() produces a separate fit to each column of only the rowwise complete data.

Bert Gunter

-----Original Message-----

From: Ravi Varadhan [mailto:rvaradhan@jhmi.edu]
Sent: Wednesday, January 03, 2007 2:15 PM
To: 'Bert Gunter'; 'Talbot Katz'; r-help@stat.math.ethz.ch
Subject: RE: [R] na.action and simultaneous regressions

No, Bert, lm doesn't produce a list each of whose components is a separate fit using "all" the nonmissing data in the column. It is true that the regressions are independently performed, but when the response matrix is passed from "lm" on to "lm.fit", only the complete rows are passed, i.e. rows with no missing values. I looked at "lm" function, but it was not obvious to me how to fix it.

In the following toy example, the degrees of freedom for y1 regression should be 18 and that for y2 should be 15, but both degrees of freedom are only 15.

*> y1 <- runif(20)
**> y2 <- c(runif(17), rep(NA,3))
*

> x <- rnorm(20)

> summary(lm(cbind(y1,y2) ~ x))

Response y1 :

Call:

lm(formula = y1 ~ x)

Residuals:

Min 1Q Median 3Q Max -0.52592 -0.22632 -0.00964 0.25117 0.31227

Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) 0.56989 0.06902 8.257 5.82e-07 *** x -0.12325 0.06516 -1.891 0.078 .

--- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.2798 on 15 degrees of freedom Multiple R-Squared: 0.1926, Adjusted R-squared: 0.1387 F-statistic: 3.577 on 1 and 15 DF, p-value: 0.07804 Response y2 : Call: lm(formula = y2 ~ x) Residuals: Min 1Q Median 3Q Max -0.48880 -0.28552 -0.06022 0.23167 0.54425 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.43712 0.07686 5.687 4.31e-05 *** x 0.10278 0.07257 1.416 0.177 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.3115 on 15 degrees of freedom Multiple R-Squared: 0.118, Adjusted R-squared: 0.05915 F-statistic: 2.006 on 1 and 15 DF, p-value: 0.1771 Ravi. ---------------------------------------------------------------------------- ------- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvaradhan@jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html ---------------------------------------------------------------------------- -------- -----Original Message----- From: r-help-bounces@stat.math.ethz.ch [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Bert Gunter Sent: Wednesday, January 03, 2007 4:46 PM To: 'Talbot Katz'; r-help@stat.math.ethz.ch Subject: Re: [R] na.action and simultaneous regressions As the Help page says: If response is a matrix a linear model is fitted separately by least-squares to each column of the matrix So there's nothing hidden going on "behind the scenes," and apply(cbind(y1,y2),2,function(z)lm(z~x)) (or an explicit loop, of course) will produce a list each of whose components is a separate fit using all the nonmissing data in the column. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 -----Original Message----- From: r-help-bounces@stat.math.ethz.ch [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Talbot Katz Sent: Wednesday, January 03, 2007 11:56 AM To: r-help@stat.math.ethz.ch Subject: [R] na.action and simultaneous regressions Hi. I am running regressions of several dependent variables using the same set of independent variables. The independent variable values are complete, but each dependent variable has some missing values for some observations; by default, lm(y1~x) will carry out the regressions using only the observations without missing values of y1. If I do lm(cbind(y1,y2)~x), the default will be to use only the observations for which neither y1 nor y2 is missing. I'd like to have the regression for each separate dependent variable use all the non-missing cases for that variable. I would think that there should be a way to do that using the na.action option, but I haven't seen this in the documentation or figured out how to do it on my own. Can it be done this way, or do I have to code the regressions in a loop? (By the way, since it restricts to non-missing values in all the variables simultaneously, is this because it's doing some sort of SUR or other simultaneous equation estimation behind the scenes?) Thanks! -- TMK -- 212-460-5430 home 917-656-5351 cell ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.Received on Thu Jan 04 09:36:38 2007

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Wed 03 Jan 2007 - 23:30:25 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*