From: John Fox <jfox_at_mcmaster.ca>

Date: Wed, 09 May 2007 07:16:37 -0400

R-help_at_stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 09 May 2007 - 11:23:41 GMT

Date: Wed, 09 May 2007 07:16:37 -0400

Dear Hadley,

> -----Original Message-----

*> From: hadley wickham [mailto:h.wickham_at_gmail.com]
**> Sent: Wednesday, May 09, 2007 2:21 AM
**> To: John Fox
**> Cc: R-help_at_stat.math.ethz.ch
**> Subject: Re: [R] Weighted least squares
**>
**> Thanks John,
**>
**> That's just the explanation I was looking for. I had hoped
**> that there would be a built in way of dealing with them with
**> R, but obviously not.
**>
**> Given that explanation, it stills seems to me that the way R
**> calculates n is suboptimal, as demonstrated by my second example:
**>
**> summary(lm(y ~ x, data=df, weights=rep(c(0,2), each=50)))
**> summary(lm(y ~ x, data=df, weights=rep(c(0.01,2), each=50)))
**>
**> the weights are only very slightly different but the
**> estimates of residual standard error are quite different (20
**> vs 14 in my run)
**>
*

Observations with 0 weight are literally excluded, while those with very small weight (relative to others) don't contribute much to the fit. Consequently you get very similar coefficients but different numbers of observations.

I hope this helps,

John

> Hadley

*>
**> On 5/8/07, John Fox <jfox_at_mcmaster.ca> wrote:
**> > Dear Hadley,
**> >
**> > I think that the problem is that the term "weights" has different
**> > meanings, which, although they are related, are not quite the same.
**> >
**> > The weights used by lm() are (inverse-)"variance weights,"
**> reflecting
**> > the variances of the errors, with observations that have
**> low-variance
**> > errors therefore being accorded greater weight in the
**> resulting WLS regression.
**> > What you have are sometimes called "case weights," and I'm
**> unaware of
**> > a general way of handling them in R, although you could
**> regenerate the
**> > unaggregated data. As you discovered, you get the same coefficients
**> > with case weights as with variance weights, but different
**> standard errors.
**> > Finally, there are "sampling weights," which are inversely
**> > proportional to the probability of selection; these are
**> accommodated by the survey package.
**> >
**> > To complicate matters, this terminology isn't entirely standard.
**> >
**> > I hope this helps,
**> > John
**> >
**> > --------------------------------
**> > John Fox, Professor
**> > Department of Sociology
**> > McMaster University
**> > Hamilton, Ontario
**> > Canada L8S 4M4
**> > 905-525-9140x23604
**> > http://socserv.mcmaster.ca/jfox
**> > --------------------------------
**> >
**> > > -----Original Message-----
**> > > From: r-help-bounces_at_stat.math.ethz.ch
**> > > [mailto:r-help-bounces_at_stat.math.ethz.ch] On Behalf Of hadley
**> > > wickham
**> > > Sent: Tuesday, May 08, 2007 5:09 AM
**> > > To: R Help
**> > > Subject: [R] Weighted least squares
**> > >
**> > > Dear all,
**> > >
**> > > I'm struggling with weighted least squares, where
**> something that I
**> > > had assumed to be true appears not to be the case.
**> > > Take the following data set as an example:
**> > >
**> > > df <- data.frame(x = runif(100, 0, 100)) df$y <- df$x + 1 +
**> > > rnorm(100, sd=15)
**> > >
**> > > I had expected that:
**> > >
**> > > summary(lm(y ~ x, data=df, weights=rep(2, 100)))
**> summary(lm(y ~ x,
**> > > data=rbind(df,df)))
**> > >
**> > > would be equivalent, but they are not. I suspect the
**> difference is
**> > > how the degrees of freedom is calculated - I had expected
**> it to be
**> > > sum(weights), but seems to be sum(weights > 0). This seems
**> > > unintuitive to me:
**> > >
**> > > summary(lm(y ~ x, data=df, weights=rep(c(0,2), each=50)))
**> > > summary(lm(y ~ x, data=df, weights=rep(c(0.01,2), each=50)))
**> > >
**> > > What am I missing? And what is the usual way to do a linear
**> > > regression when you have aggregated data?
**> > >
**> > > Thanks,
**> > >
**> > > Hadley
**> > >
**> > > ______________________________________________
**> > > R-help_at_stat.math.ethz.ch mailing list
**> > > https://stat.ethz.ch/mailman/listinfo/r-help
**> > > PLEASE do read the posting guide
**> > > http://www.R-project.org/posting-guide.html
**> > > and provide commented, minimal, self-contained, reproducible code.
**> > >
**> >
**> >
**> >
*

>

R-help_at_stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 09 May 2007 - 11:23:41 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Wed 09 May 2007 - 13:31:34 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*