From: Thomas Lumley <tlumley_at_u.washington.edu>

Date: Sat 08 Jul 2006 - 07:34:36 EST

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sat Jul 08 07:40:26 2006

Date: Sat 08 Jul 2006 - 07:34:36 EST

On Fri, 7 Jul 2006, Mark Hempelmann wrote:

> library(survey)

*> multi3 <- data.frame(cluster=c(1,1,1,1 ,2,2,2, 3,3), id=c(1,2,3,4,
**> 1,2,3, 1,2),
**> nl=c(4,4,4,4, 3,3,3, 2,2), Nl=c(100,100,100,100, 50,50,50, 75,75),
**> M=rep(23,9),
**> y=c(23,33,77,25, 35,74,27, 37,72) )
**>
**> dmulti3 <- svydesign(id=~cluster+id, fpc=~M+Nl, data=multi3)
**> svymean (~y, dmulti3)
**> mean SE
**> y 45.796 5.5483
**>
**> svytotal(~y, dmulti3)
**> total SE
**> y 78999 13643
**>
**> and I estimate the population total as N=M/m sum(Nl) =
**> 23/3*(100+50+75)=1725. With this, my variance estimator is:
**> y1<-mean(multi3$y[1:4]) # 39.5
**> y2<-mean(multi3$y[5:7]) # 45.33
**> y3<-mean(multi3$y[8:9]) # 54.5
**>
**> yT1<-100*y1 # 3950 total cluster 1
**> yT2<-50*y2 # 2266.67 total cluster 2
**> yT3<-75*y3 # 4087.5 total cluster 3
**> ybarT<-1/3*sum(yT1,yT2,yT3) # 3434.722
**> s1 <- var(multi3$y[1:4]) # 643.67 var cluster 1
**> s2 <- var(multi3$y[5:7]) # 632.33 var cluster 2
**> s3 <- var(multi3$y[8:9]) # 612.5 var cluster 3
**>
**> var.yT <- 23^2*( 20/23*1/6*sum(
**> (yT1-ybarT)^2,(yT2-ybarT)^2,(yT3-ybarT)^2 ) +
**> 1/69 * sum(100*96*s1, 50*47*s2, 75*73*s3) ) # 242 101 517
*

I don't have any of my reference books here today, but if you use

var.yT <- 23^2*( 20/23*1/6*sum(

(yT1-ybarT)^2,(yT2-ybarT)^2,(yT3-ybarT)^2 ) +
1/69 * sum(100*96*s1/4, 50*47*s2/3, 75*73*s3/2) ) # 242 101 517
the results agrees with svytotal(), and with Stata, and with formulas in a
couple of sets of lecture notes I found by Googling.

*> but
*

> var.yT/1725^2 = 81.36157

*> SE = 9.02006,
**> but it should be SE=13643/1725=7.90899
**>
**> Is this calculation correct? I remember svytotal using a different
**> variance estimator compared to svymean, and that svytotal gives the
**> unbiased estimation.
*

This calculation is not correct for the mean, since it ignores the uncertainty in the estimated population total. The correct standard error comes from treating the mean as a ratio of estimated total to estimated population size. In this case you have to do it that way since you don't know the population size, but R always does it this way. Because the estimated population size and total are correlated, taking into account the uncertainty in the denominator actually reduces the standard error.

The easiest way to reproduce the result that R gets is to do it the same
way that R does: compute the standard error of the mean as the standard
error of the total of a suitable set of estimating functions. If you
define a new variable (y-45.796*1)/1725 and estimate the standard error of
the total of this variable it will give:

*> svytotal(~I((y-45.796)/1725),dmulti3)
*

I((y - 45.796)/1725) 0.0002963 5.5482

which is what svymean() gives for the standard error of the mean of y.
Using your formula for the variance of the total (with the corrections
above) on this variable also gives

*> sqrt(var.yT)
*

[1] 5.54824

-thomas

Thomas Lumley Assoc. Professor, Biostatistics tlumley@u.washington.edu University of Washington, Seattle ______________________________________________R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sat Jul 08 07:40:26 2006

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Sat 08 Jul 2006 - 10:15:33 EST.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*