From: Peter Dalgaard <pdalgd_at_gmail.com>

Date: Thu, 29 Jul 2010 08:10:59 +0200

>> In modelling functions some people like to use

*>> a weight of 0 to drop an observation instead of
*

*>> using a subset value of FALSE. E.g.,
*

*>> weights=c(0,1,1,...)
*

*>> instead of
*

*>> subset=c(FALSE, TRUE, TRUE, ...)
*

*>> to drop the first observation.
*

*>>
*

*>> lm() and summary.lm() appear to treat these in the
*

*>> same way, decrementing the number of degrees of
*

*>> freedom for each dropped observation. However,
*

*>> predict.lm() does not treat them the same. It
*

*>> doesn't seem to diminish the df to account for the
*

*>> 0-weighted observations. E.g., the last printout
*

*>> from the following script is as follows, where
*

*>> predw is the prediction from the fit that used
*

*>> 0-weights and preds is from using FALSE's in the
*

*>> subset argument. Is this difference proper?
*

>> length(lm(y~1,weights=w)$residuals)

>> length(lm(y~1,subset=-1)$residuals)

*>> w
*

Date: Thu, 29 Jul 2010 08:10:59 +0200

Peter Dalgaard wrote:

> William Dunlap wrote:

>> In modelling functions some people like to use

> > Nice catch. > > The issue is that the subset fit and the zero-weighted fit are not > completely the same. Notice that the residuals vector has different > length in the two analyses. With a simplified setup: >

>> length(lm(y~1,weights=w)$residuals)

> [1] 10

>> length(lm(y~1,subset=-1)$residuals)

> [1] 9

> [1] 0 1 1 1 1 1 1 1 1 1 > > This in turn is what confuses predict.lm because it gets n and residual > df from length(object$residuals). summary.lm() uses NROW(Qr$qr), and I > suppose that predict.lm should follow suit. >

...and then when I went to fix it, I found that the actual line in the sources (stats/R/lm.R) reads

27442 ripley n <- length(object$residuals) # NROW(object$qr$qr)

so it's been like that since December 2003. I wonder if Brian remembers what the point was? (27442 was the restructuring into the stats package, so it might not actually be Brian's code).

-pd

-- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: pd.mes_at_cbs.dk Priv: PDalgd_at_gmail.com ______________________________________________ R-devel_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-develReceived on Thu 29 Jul 2010 - 06:13:24 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Thu 29 Jul 2010 - 07:50:20 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel.
Please read the posting
guide before posting to the list.
*