**From:** Han-Lin Lai (*Han-Lin.lai@noaa.gov*)

**Date:** Sat 22 May 2004 - 04:06:58 EST

**Next message:**array chip: "[R] interval-censored data in coxph"**Previous message:**Paul Gilbert: "Re: [R] Windows versus Unix packages in CRAN ..."

Message-id: <40AE4542.BC091B9D@noaa.gov>

Hi, All

Thanks to Robert Baskin, Thomas Lumley, and Spencer Graves for the

valuable helps. I have learned a lot from this discussion.

I put all discussions together without editing, so we can see how things

are evolved. Likely, I have a lot of articles to read. As in the

discussion, mixed modeling approach is a poosible but may be over-kill

in my posted data analyses. I will explore other plausible methods as

suggested in the discussion.

Best Regards,

Han

*>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
*

From: Thomas Lumley

On Thu, 20 May 2004, Spencer Graves wrote:

*> Cassel, Sarndal and Wretman (1977) Foundations of Inference in
*

*> Survey Sampling (Krieger) insisted that for infinite population
*

*> inference (what Deming called an 'analytic study'), the sampling
*

*> probabilities should be ignored UNLESS they related somehow to something
*

*> of interest in the model. In other words, is the sampling informative
*

*> or noninformative? If noninformative, the sampling probabilities do not
*

*> appear in the likelihood and therefore should not affect inference. As
*

*> I recall, Cassel, Sarndal and Wretman said that if stratified random
*

*> sampling is used, and if the stratification system is included in the
*

*> model, then the sampling is noninformative, and the sampling
*

*> probabilities should not affect inference.
*

This is the point of including the sampling weights as a predictor.

These

weights carry all the informativeness of the sampling scheme, and so

correctly modelling them is sufficient. If the sampling is already

non-informative then including them as a predictor is harmless.

However, my point was that you may not want to condition on all the

variables that go into the sampling scheme, in which case the simplest

solution may be design-based inference.

-thomas

*>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
*

From: Spencer Graves <>

Cassel, Sarndal and Wretman (1977) Foundations of Inference in

Survey Sampling (Krieger) insisted that for infinite population

inference (what Deming called an 'analytic study'), the sampling

probabilities should be ignored UNLESS they related somehow to something

of interest in the model. In other words, is the sampling informative

or noninformative? If noninformative, the sampling probabilities do not

appear in the likelihood and therefore should not affect inference. As

I recall, Cassel, Sarndal and Wretman said that if stratified random

sampling is used, and if the stratification system is included in the

model, then the sampling is noninformative, and the sampling

probabilities should not affect inference.

From this paradigm, using weights inversely proportional to

sampling probabilities is (primarily?) a tool for finite population

inference -- what Deming called an 'enumerative study'. For an

enumerative study, the purpose is to make inference about a fixed,

finite population, e.g., how to feed the people in Japan who would

otherwise starve within the next week or month, which was the situation

when Deming directed a survey there shortly after World War II. For an

analytic study, the purpose is more long term, e.g., how to design a

national alimentary system to feed the people who will be there 10 or 30

years from now. Since most of my work has dealt processed that will

create the future, rather than dealing with fixed, finite populations, I

have ignored sampling probabilities in most of my work (though I have

not worked much recently with sample surveys).

Is this still consistent with current thinking? Is it feasible to

summarize in a few words what Pferrermann, Korn et al. say about this?

Thanks,

spencer graves

Thomas Lumley wrote:

*>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
*

*>On Thu, 20 May 2004, Baskin, Robert wrote:
*

*>
*

*>
*

*>
*

*>>Han-Lin
*

*>>
*

*>>I don't think I have seen a reply so I will suggest that maybe you could try
*

*>>a different approach than what you are thinking about doing. I believe the
*

*>>current best practice is to use the weights as a covariate in a regression
*

*>>model - and bytheway - the weights are the inverse of the probabilities of
*

*>>selection - not the probabilities.
*

*>>
*

*>>Fundamentally, there is a difficulty in making sense out of 'random effects'
*

*>>in a finite population setting.
*

*>>
*

*>>
*

*>
*

*>I would have thought that it matters why you are fitting a mixed model.
*

*>Often people use mixed models when they are just interested in inference
*

*>about the mean and need to model the covariances to get valid standard
*

*>errors. In that situation you could use an ordinary survey regression to
*

*>get a design-based result.
*

*>
*

*>If you are actually interested in variance components then you need some
*

*>other approach, and putting the weights into the model as a covariate will
*

*>presumably give a valid model-based result (since the weights carry all
*

*>the biased sampling information --- like a propensity score). Presumably
*

*>this is also more efficient.
*

*>
*

*>However, it could well be that you don't want those variables in the
*

*>model. If the sampling depends on a variable Z correlated with Y and X and
*

*>you want to model the distribution of Y given X, not the distribution of Y
*

*>given X and Z, you are still in trouble.
*

*>
*

*>
*

*> -thomas
*

*>
*

*>
*

*>
*

*>
*

*>
*

*>>(plagiarized from some unknown source)
*

*>>See: < 9. Pfeffermann, D. , Skinner, C. J. , Holmes, D. J. , Goldstein, H. ,
*

*>>and Rasbash, J. (1998), ``Weighting for unequal selection probabilities in
*

*>>multilevel models (Disc: p41-56)'', Journal of the Royal Statistical
*

*>>Society, Series B, Methodological, 60 , 23-40 >
*

*>>
*

*>>which refers back to:
*

*>><29. Pfeffermann, D. , and LaVange, L. (1989), ``Regression models for
*

*>>stratified multi-stage cluster samples'', Analysis of Complex Surveys,
*

*>>237-260 >
*

*>>
*

*>>If you don't like statistical papers, then see section 4.5 of <8. Korn,
*

*>>Edward Lee , and Graubard, Barry I. (1999), ``Analysis of health surveys'',
*

*>>John Wiley & Sons (New York; Chichester) > They explain the idea of using
*

*>>weights in a model fairly simply.
*

*>>
*

*>>Bob
*

*>>
*

*>>
*

*>>-----Original Message-----
*

*>>From: Han-Lin Lai [mailto:Han-Lin.Lai@noaa.gov]
*

*>>Sent: Wednesday, May 19, 2004 12:47 PM
*

*>>To: r-help@stat.math.ethz.ch
*

*>>Subject: [R] mixed models for analyzing survey data with unequal selection
*

*>>probability
*

*>>
*

*>>Hi,
*

*>>
*

*>>I need the help on this topic because this is out of my statistical
*

*>>trianing as biologist. Here is my brief description of the problem.
*

*>>
*

*>>I have a survey that VESSELs are selected at random with the probability
*

*>>of p(j). Then the tows within the jth VESSEL are sampled at random with
*

*>>probability of p(i|j). I write my model as
*

*>>
*

*>>y = XB + Zb + e
*

*>>where XB is fixed part, Zb is for random effect (VESSEL) and e is
*

*>>within-vessel error.
*

*>>
*

*>>I feel that I should weight the Zb part by p(j) and the e-part by
*

*>>p(i,j)=p(j)*p(i|j). Is this a correct weighting?
*

*>>
*

*>>How can I implement the weightings in nlme (or lme)? I think that
*

*>>p(i,j) can be specified by nlme(..., weights=p(i,j),...)? Where is p(j)
*

*>>to be used in nlme?
*

*>>
*

*>>I appreciate anyone can provide examples and literature for this
*

*>>problem.
*

*>>
*

*>>Cheers!
*

*>>Han
*

*>>
*

*>>______________________________________________
*

*>>R-help@stat.math.ethz.ch mailing list
*

*>>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
*

*>>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
*

*>>
*

*>>
*

*>>
*

*>
*

*>Thomas Lumley Assoc. Professor, Biostatistics
*

*>tlumley@u.washington.edu University of Washington, Seattle
*

*>
*

*>______________________________________________
*

*>R-help@stat.math.ethz.ch mailing list
*

*>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
*

*>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
*

*>
*

*>
*

______________________________________________

R-help@stat.math.ethz.ch mailing list

https://www.stat.math.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

**Next message:**array chip: "[R] interval-censored data in coxph"**Previous message:**Paul Gilbert: "Re: [R] Windows versus Unix packages in CRAN ..."

*
This archive was generated by hypermail 2.1.3
: Mon 31 May 2004 - 23:05:12 EST
*