Re: [R] Weighted least squares

From: Tim Hesterberg <>
Date: Mon, 11 Jun 2007 10:26:05 -0700

As John noted, there are different kinds of weights, and different terminology:

* inverse-variance weights (accuracy weights)
* case weights (frequencies, counts)
* sampling weights (selection probability weights)

I'll add:
* inverse-variance weights, where var(y for observation) = 1/weight   (as opposed to just being inversely proportional to the weight) * weights used as part of an algorithm (e.g. for robust estimation,   or glm's using iteratively-reweighted least-squares).

For linear regression, the type of weights don't affect regression coefficient calculation, but do affect inferences such as standard errors for the regression coefficients, degrees of freedom for variance estimates, etc.

lm() inferences assume the first type. Other formulae are appropriate for inferences for types 2-4. Combinations of types 1-4 require other formulae; this gets nontrivial. For the 5th type, inferences need to be handled by the algorithm that is using weighted linear regression.

Tim Hesterberg

John Fox wrote:
>I think that the problem is that the term "weights" has different meanings,
>which, although they are related, are not quite the same.
>The weights used by lm() are (inverse-)"variance weights," reflecting the
>variances of the errors, with observations that have low-variance errors
>therefore being accorded greater weight in the resulting WLS regression.
>What you have are sometimes called "case weights," and I'm unaware of a
>general way of handling them in R, although you could regenerate the
>unaggregated data. As you discovered, you get the same coefficients with
>case weights as with variance weights, but different standard errors.
>Finally, there are "sampling weights," which are inversely proportional to
>the probability of selection; these are accommodated by the survey package.
>To complicate matters, this terminology isn't entirely standard. mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Mon 11 Jun 2007 - 17:32:20 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 11 Jun 2007 - 18:31:52 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.