From: Thomas Lumley <tlumley_at_u.washington.edu>

Date: Fri 27 May 2005 - 06:04:32 EST

*> svymean(~y,d1)
*

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri May 27 06:07:28 2005

Date: Fri 27 May 2005 - 06:04:32 EST

On Thu, 26 May 2005, Mark Hempelmann wrote:

> Dear WizaRds,

*>
**> Working through sampling theory, I tried to comprehend the concept of
**> stratification and apply it with Survey to a small example. My question is
**> more of theoretic nature, so I apologize if this does not fully fit this
**> board's intention, but I have come to a complete stop in my efforts and need
**> an expert to help me along. Please help:
**>
**> age<-matrix(c(rep(1,5), rep(2,3), 1:8, rep(3,5), rep(4,3), rep(5,5),
**> rep(3,3), rep(15,5), rep(12,3), 23,25,27,21,22, 33,27,29), ncol=6, byrow=F)
**> colnames(age)<-c("stratum", "id", "weight", "nh", "Nh", "y")
**> age<-as.data.frame(age)
*

Ok. Assuming that Nh are the population sizes in each stratum, you have 5/15 sampled in stratum 1 and 3/12 in stratum 2.

This can be specified in a number of ways You can use

sampling weights of 15/5 and 12/3

sampling probabilities of 5/15 and 3/12
without or without specifiying the finite population correction. The
finite population correction can be specified as 15 and 12 or 5/15 and
3/12, and if the finite population correction is specified the weights are
then optional.

So

d1<-svydesign(ids=~id, strata=~stratum, weight=~I(Nh/nh), data=age) d2<-svydesign(ids=~id, strata=~stratum, prob=~I(nh/Nh), data=age) give the with-replacement design (agreeing with your age.des3) and

d3<-svydesign(ids=~id, strata=~stratum, weight=~I(Nh/nh), fpc=~Nh,data=age) d4<-svydesign(ids=~id, strata=~stratum, prob=~I(nh/Nh), fpc=~Nh,data=age) d5<-svydesign(ids=~id, strata=~stratum, weight=~I(Nh/nh), fpc=~I(nh/Nh),data=age) d6<-svydesign(ids=~id, strata=~stratum, prob=~I(nh/Nh), fpc=~I(nh/Nh),data=age) d7<-svydesign(ids=~id, strata=~stratum, fpc=~Nh,data=age) d8<-svydesign(ids=~id, strata=~stratum, fpc=~I(nh/Nh),data=age)all give the without-replacement design. We get

mean SE

y 26.296 0.9862

*> svymean(~y,d2)
*

mean SE

y 26.296 0.9862

*> svymean(~y,d3)
*

mean SE

y 26.296 0.8364

*> svymean(~y,d4)
*

mean SE

y 26.296 0.8364

*> svymean(~y,d5)
*

mean SE

y 26.296 0.8364

*> svymean(~y,d6)
*

mean SE

y 26.296 0.8364

*> svymean(~y,d7)
*

mean SE

y 26.296 0.8364

*> svymean(~y,d8)
*

mean SE

y 26.296 0.8364

Now, looking at your examples

> ## create survey design object

*> age.des1<-svydesign(ids=~id, strata=~stratum, weight=~Nh, data=age)
**> svymean(~y, age.des1)
**> ## gives mean 25.568, SE 0.9257
*

This is wrong: the sampling weight is Nh/nh, not Nh

> age.des2<-svydesign(ids=~id, strata=~stratum, weight=~I(nh/Nh), data=age)

*> svymean(~y, age.des2)
**> ## gives mean 25.483, SE 0.9227
*

This is wrong: the sampling weight is Nh/nh. You need prob=~I(nh/Nh) to specify sampling fractions.

> age.des3<-svydesign(ids=~id, strata=~stratum, weight=~weight, data=age)

*> svymean(~y, age.des3)
**> ## gives mean 26.296, SE 0.9862
*

This is correct and agrees with d1 and d2

> age.des4<-svydesign(ids=~id, strata=~stratum, data=age)

*> svymean(~y, age.des4)
**> ## gives mean 25.875, SE 0.9437
*

This is a stratified, unweighted mean, ie mean(age$y).

> age.des3 is the only estimator I am able to compute per hand correctly. It is

*> stratified random sampling with inverse probablility weighting with weight=
**> nh/Nh ## sample size/ stratum size.
**>
**> Basically, I thought the option weight=~Nh as well as weight=~I(nh/Nh) would
**> result in the same number, but it does not.
*

No, it does not. A weight of 3 is not the same as a weight of 1/3. With the finite population correction it is safe to assume that numbers less than 1 are sampling fractions and numbers greater than 1 are population sizes, but this isn't safe when it comes to weights. It is possible that someone could want to use sampling weights less than 1.

*>
*

> I thought the Hansen-Hurwitz estimator per stratum offers the right numbers:

*> p1=5/15, p2=3/12, so y1.total=1/5*(3*118), y2.total=1/3*(4*89) and the
**> stratified estimator with this design should be: 1/27(y1.total+y2.total),
**> obviously wrong.
*

Since this gives a mean of 7.01 for numbers around 25 it can't be right. You have divided by sample size twice. You should have

y1.total<-3*118

y2.total<-4*89

You then will get (y1.total+y2.total)/27 to be 26.29630, in agreement
with svymean().

-thomas

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri May 27 06:07:28 2005

*
This archive was generated by hypermail 2.1.8
: Fri 03 Mar 2006 - 03:32:08 EST
*