# Re: [R] Weighted least squares

From: John Fox <jfox_at_mcmaster.ca>
Date: Wed, 09 May 2007 07:16:37 -0400

> -----Original Message-----
> Sent: Wednesday, May 09, 2007 2:21 AM
> To: John Fox
> Cc: R-help_at_stat.math.ethz.ch
> Subject: Re: [R] Weighted least squares
>
> Thanks John,
>
> That's just the explanation I was looking for. I had hoped
> that there would be a built in way of dealing with them with
> R, but obviously not.
>
> Given that explanation, it stills seems to me that the way R
> calculates n is suboptimal, as demonstrated by my second example:
>
> summary(lm(y ~ x, data=df, weights=rep(c(0,2), each=50)))
> summary(lm(y ~ x, data=df, weights=rep(c(0.01,2), each=50)))
>
> the weights are only very slightly different but the
> estimates of residual standard error are quite different (20
> vs 14 in my run)
>

Observations with 0 weight are literally excluded, while those with very small weight (relative to others) don't contribute much to the fit. Consequently you get very similar coefficients but different numbers of observations.

I hope this helps,
John

>
> On 5/8/07, John Fox <jfox_at_mcmaster.ca> wrote:
> >
> > I think that the problem is that the term "weights" has different
> > meanings, which, although they are related, are not quite the same.
> >
> > The weights used by lm() are (inverse-)"variance weights,"
> reflecting
> > the variances of the errors, with observations that have
> low-variance
> > errors therefore being accorded greater weight in the
> resulting WLS regression.
> > What you have are sometimes called "case weights," and I'm
> unaware of
> > a general way of handling them in R, although you could
> regenerate the
> > unaggregated data. As you discovered, you get the same coefficients

> > with case weights as with variance weights, but different
> standard errors.
> > Finally, there are "sampling weights," which are inversely
> > proportional to the probability of selection; these are
> accommodated by the survey package.
> >
> > To complicate matters, this terminology isn't entirely standard.
> >
> > I hope this helps,
> > John
> >
> > --------------------------------
> > John Fox, Professor
> > Department of Sociology
> > McMaster University
> > Hamilton, Ontario
> > 905-525-9140x23604
> > http://socserv.mcmaster.ca/jfox
> > --------------------------------
> >
> > > -----Original Message-----
> > > From: r-help-bounces_at_stat.math.ethz.ch
> > > [mailto:r-help-bounces_at_stat.math.ethz.ch] On Behalf Of hadley
> > > wickham
> > > Sent: Tuesday, May 08, 2007 5:09 AM
> > > To: R Help
> > > Subject: [R] Weighted least squares
> > >
> > > Dear all,
> > >
> > > I'm struggling with weighted least squares, where
> something that I
> > > had assumed to be true appears not to be the case.
> > > Take the following data set as an example:
> > >
> > > df <- data.frame(x = runif(100, 0, 100)) df\$y <- df\$x + 1 +
> > > rnorm(100, sd=15)
> > >
> > > I had expected that:
> > >
> > > summary(lm(y ~ x, data=df, weights=rep(2, 100)))
> summary(lm(y ~ x,
> > > data=rbind(df,df)))
> > >
> > > would be equivalent, but they are not. I suspect the
> difference is
> > > how the degrees of freedom is calculated - I had expected
> it to be
> > > sum(weights), but seems to be sum(weights > 0). This seems
> > > unintuitive to me:
> > >
> > > summary(lm(y ~ x, data=df, weights=rep(c(0,2), each=50)))
> > > summary(lm(y ~ x, data=df, weights=rep(c(0.01,2), each=50)))
> > >
> > > What am I missing? And what is the usual way to do a linear
> > > regression when you have aggregated data?
> > >
> > > Thanks,
> > >
> > >
> > > ______________________________________________
> > > R-help_at_stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >
> >
>

R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 09 May 2007 - 11:23:41 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 09 May 2007 - 13:31:34 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.