[R] Multistage Sampling

From: Mark Hempelmann <e.rehak_at_t-online.de>
Date: Sat 08 Jul 2006 - 05:05:29 EST


Dear WizaRds, dear Thomas,

    First of all, I want to tell you how grateful I am for all your support. I wish I will be able to help others along one day the same way you do. Thank you so much. I am struggling with a multistage sampling design:

library(survey)
multi3 <- data.frame(cluster=c(1,1,1,1 ,2,2,2, 3,3), id=c(1,2,3,4,

1,2,3, 1,2),
nl=c(4,4,4,4, 3,3,3, 2,2), Nl=c(100,100,100,100, 50,50,50, 75,75), 
M=rep(23,9),

y=c(23,33,77,25, 35,74,27, 37,72) )

dmulti3 <- svydesign(id=~cluster+id, fpc=~M+Nl, data=multi3) svymean (~y, dmulti3)

    mean SE
y 45.796 5.5483

svytotal(~y, dmulti3)
  total SE
y 78999 13643

and I estimate the population total as N=M/m sum(Nl) = 23/3*(100+50+75)=1725. With this, my variance estimator is:

y1<-mean(multi3$y[1:4]) # 39.5
y2<-mean(multi3$y[5:7]) # 45.33
y3<-mean(multi3$y[8:9]) # 54.5

yT1<-100*y1 # 3950 total cluster 1

yT2<-50*y2 # 2266.67 total cluster 2
yT3<-75*y3 # 4087.5 total cluster 3
ybarT<-1/3*sum(yT1,yT2,yT3) # 3434.722
s1 <- var(multi3$y[1:4]) # 643.67 var cluster 1
s2 <- var(multi3$y[5:7]) # 632.33 var cluster 2
s3 <- var(multi3$y[8:9]) # 612.5 var cluster 3

var.yT <- 23^2*( 20/23*1/6*sum(
(yT1-ybarT)^2,(yT2-ybarT)^2,(yT3-ybarT)^2 ) + 1/69 * sum(100*96*s1, 50*47*s2, 75*73*s3) ) # 242 101 517

but
var.yT/1725^2 = 81.36157
SE = 9.02006,
but it should be SE=13643/1725=7.90899

Is this calculation correct? I remember svytotal using a different variance estimator compared to svymean, and that svytotal gives the unbiased estimation. To solve the problem, I went ahead and tried to calibrate the design object, telling Survey the population total N=1725:

dmulti3.cal <- calibrate(dmulti3, ~1, pop=1725) svymean (~y, dmulti3.cal)

    mean SE
y 45.796 5.5483

svytotal(~y, dmulti3.cal)
  total SE
y 78999 9570.7

, which indeed gives me the computed svymean SE, but alas, I still don't know why my variance is so different. I think it might have sthg to do with a differently computed N and the fact that your estimator formula is a different one. Since I calculated the Taylor Series solution, i suppose there must be another approach? The calibration help page tells me to enter a list of population total vectors for each cluster, which would result in:

dmulti3.cal <- calibrate(dmulti3, ~1, pop=c(100,50,75)) Error in regcalibrate.survey.design2(design, formula, population, aggregate.stage = aggregate.stage, :
Population and sample totals are not the same length.

I am very grateful for your help and wish you alle the best Yours
mark



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sat Jul 08 05:11:39 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Sat 08 Jul 2006 - 08:15:26 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.