Re: [R] large survey data set

About this list Date view Thread view Subject view Author view Attachment view

From: Thomas Lumley (tlumley@u.washington.edu)
Date: Sat 29 Jun 2002 - 02:23:49 EST


Message-id: <Pine.A41.4.44.0206280857470.123502-100000@homer04.u.washington.edu>

On Fri, 28 Jun 2002, Andrew Perrin wrote:

> This is interesting and a bit disturbing. I've been using the weights=
> syntax to assign a case-weighting system in a survey dataset as well. Can
> you send me somewhere for documentation of the differences?

There's some discussion in
   http://www.niesr.ac.uk/niesr/wers98/Purdpap4.pdf

I don't know of a reference book that describes both -- they tend to be
done by non-overlapping groups of people -- but the two sets of formulas
should be easy to find.

For linear regression the estimation in both cases is done by multiplying
the X and Y matrices by the square root of the weights and then doing
ordinary least squares. The difference is that with variance weights this
transformed least squares fit will have constant variance residuals but
with probability weights it typically won't, so the usual standard errors
are wrong.

For linear regression there will only be serious problems if you have a
variable that strongly predicts the outcome and the weights and has a
skewed distribution. For logistic or Poisson regression, where there
isn't a free dispersion parameter available the problems can be worse.

        -thomas

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._


About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.3 : Wed 16 Oct 2002 - 11:57:34 EST