# Re: [Rd] problem with zero-weighted observations in predict.lm?

From: Peter Dalgaard <pdalgd_at_gmail.com>
Date: Thu, 29 Jul 2010 08:10:59 +0200

Peter Dalgaard wrote:

```> William Dunlap wrote:
```

>> In modelling functions some people like to use
>> a weight of 0 to drop an observation instead of
>> using a subset value of FALSE. E.g.,
>> weights=c(0,1,1,...)
>> subset=c(FALSE, TRUE, TRUE, ...)
>> to drop the first observation.
>>
>> lm() and summary.lm() appear to treat these in the
>> same way, decrementing the number of degrees of
>> freedom for each dropped observation. However,
>> predict.lm() does not treat them the same. It
>> doesn't seem to diminish the df to account for the
>> 0-weighted observations. E.g., the last printout
>> from the following script is as follows, where
>> predw is the prediction from the fit that used
>> 0-weights and preds is from using FALSE's in the
>> subset argument. Is this difference proper?
```>
> Nice catch.
>
> The issue is that the subset fit and the zero-weighted fit are not
> completely the same. Notice that the residuals vector has different
> length in the two analyses. With a simplified setup:
>
```

>> length(lm(y~1,weights=w)\$residuals)
```> [1] 10
```

>> length(lm(y~1,subset=-1)\$residuals)
```> [1] 9
```

>> w
```>  [1] 0 1 1 1 1 1 1 1 1 1
>
> This in turn is what confuses predict.lm because it gets n and residual
> df from length(object\$residuals). summary.lm() uses NROW(Qr\$qr), and I
> suppose that predict.lm should follow suit.
>

```

...and then when I went to fix it, I found that the actual line in the sources (stats/R/lm.R) reads

27442 ripley n <- length(object\$residuals) # NROW(object\$qr\$qr)

so it's been like that since December 2003. I wonder if Brian remembers what the point was? (27442 was the restructuring into the stats package, so it might not actually be Brian's code).

-pd

```--
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd.mes_at_cbs.dk  Priv: PDalgd_at_gmail.com

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
```
Received on Thu 29 Jul 2010 - 06:13:24 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 29 Jul 2010 - 07:50:20 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.