Re: [R] Weighted least squares

From: John Fox <jfox_at_mcmaster.ca>
Date: Tue, 08 May 2007 11:19:16 -0400


Dear Hadley,

I think that the problem is that the term "weights" has different meanings, which, although they are related, are not quite the same.

The weights used by lm() are (inverse-)"variance weights," reflecting the variances of the errors, with observations that have low-variance errors therefore being accorded greater weight in the resulting WLS regression. What you have are sometimes called "case weights," and I'm unaware of a general way of handling them in R, although you could regenerate the unaggregated data. As you discovered, you get the same coefficients with case weights as with variance weights, but different standard errors. Finally, there are "sampling weights," which are inversely proportional to the probability of selection; these are accommodated by the survey package.

To complicate matters, this terminology isn't entirely standard.

I hope this helps,
 John



John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox

> -----Original Message-----
> From: r-help-bounces_at_stat.math.ethz.ch
> [mailto:r-help-bounces_at_stat.math.ethz.ch] On Behalf Of hadley wickham
> Sent: Tuesday, May 08, 2007 5:09 AM
> To: R Help
> Subject: [R] Weighted least squares
>
> Dear all,
>
> I'm struggling with weighted least squares, where something
> that I had assumed to be true appears not to be the case.
> Take the following data set as an example:
>
> df <- data.frame(x = runif(100, 0, 100)) df$y <- df$x + 1 +
> rnorm(100, sd=15)
>
> I had expected that:
>
> summary(lm(y ~ x, data=df, weights=rep(2, 100))) summary(lm(y
> ~ x, data=rbind(df,df)))
>
> would be equivalent, but they are not. I suspect the
> difference is how the degrees of freedom is calculated - I
> had expected it to be sum(weights), but seems to be
> sum(weights > 0). This seems unintuitive to me:
>
> summary(lm(y ~ x, data=df, weights=rep(c(0,2), each=50)))
> summary(lm(y ~ x, data=df, weights=rep(c(0.01,2), each=50)))
>
> What am I missing? And what is the usual way to do a linear
> regression when you have aggregated data?
>
> Thanks,
>
> Hadley
>
> ______________________________________________
> R-help_at_stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 08 May 2007 - 15:32:33 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 09 May 2007 - 07:31:24 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.