From: hadley wickham <h.wickham_at_gmail.com>

Date: Tue, 08 May 2007 11:08:34 +0200

R-help_at_stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 08 May 2007 - 09:13:27 GMT

Date: Tue, 08 May 2007 11:08:34 +0200

Dear all,

I'm struggling with weighted least squares, where something that I had assumed to be true appears not to be the case. Take the following data set as an example:

df <- data.frame(x = runif(100, 0, 100)) df$y <- df$x + 1 + rnorm(100, sd=15)

I had expected that:

summary(lm(y ~ x, data=df, weights=rep(2, 100))) summary(lm(y ~ x, data=rbind(df,df)))

would be equivalent, but they are not. I suspect the difference is how the degrees of freedom is calculated - I had expected it to be sum(weights), but seems to be sum(weights > 0). This seems unintuitive to me:

summary(lm(y ~ x, data=df, weights=rep(c(0,2), each=50))) summary(lm(y ~ x, data=df, weights=rep(c(0.01,2), each=50)))

What am I missing? And what is the usual way to do a linear regression when you have aggregated data?

Thanks,

Hadley

R-help_at_stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 08 May 2007 - 09:13:27 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Tue 08 May 2007 - 16:31:24 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*