From: Doran, Harold <HDoran_at_air.org>

Date: Mon, 18 Aug 2008 10:53:14 -0400

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 18 Aug 2008 - 15:04:06 GMT

Date: Mon, 18 Aug 2008 10:53:14 -0400

Whoops, the final var estimator var(f(Y)) should have N^4 in the
denominator not N^2

> -----Original Message-----

*> From: r-help-bounces_at_r-project.org
**> [mailto:r-help-bounces_at_r-project.org] On Behalf Of Doran, Harold
**> Sent: Monday, August 18, 2008 10:47 AM
**> To: Stas Kolenikov
**> Cc: r-help_at_r-project.org
**> Subject: Re: [R] Design-consistent variance estimate
**>
**> It also turns out that in educational testing, it is rare to
**> consider the sampling design and to estimate
**> design-consistent standard errors. I appreciate your thoughts
**> on this, Stas. As a result, I was able to bring to my mind
**> more transparency into what R's survey package as well as SAS
**> proc surveymeans are doing. I've copied some minimal latex code below.
**> My R code reflecting this latex replicates svymean() and the
**> SAS procedures exactly under all conditions that I have
**> tested so far for a
**> 1 stage cluster sample.
**>
**> It clearly reduces to a more simple expression when cluster
**> sizes are equal.
**>
**> My hat is off to sampling statisticians, this has got to be a
**> lot of fun for you :)
**>
**> ### LaTeX
**>
**> \documentclass[12pt]{article}
**> \usepackage{bm,geometry}
**> \begin{document}
**>
**> In this scenario, the appropriate procedure is to estimate
**> design-consistent standard errors. This is accomplished by
**> first defining the ratio estimator of the mean as:
**>
**> \begin{equation}
**> f(Y) = \frac{Y}{N}
**> \end{equation}
**>
**> \noindent where $Y$ is the total of the variable and $N$ is
**> the population size. Treating both $Y$ and $N$ as random
**> variables, a first-order taylor series expansion of the ratio
**> estimator $f(Y)$ can be used to derive the design-consistent
**> variance estimator as:
**>
**> \begin{equation}
**> var(f(Y)) = \left[\frac{\partial f(Y)}{\partial Y},
**> \frac{\partial f(Y)}{\partial N}\right] \left [ \begin{array}{cc}
**> var(Y) & cov(Y,N)\\
**> cov(Y,N) & var(N)\\
**> \end{array}
**> \right]
**> \left[\frac{\partial f(Y)}{\partial Y}, \frac{\partial
**> f(Y)}{\partial N}\right]^T \end{equation}
**>
**> \noindent where
**>
**> \begin{equation}
**> \left[\frac{\partial f(Y)}{\partial Y}\right] = \frac{1}{N}
**> \end{equation}
**>
**> \begin{equation}
**> \left[\frac{\partial f(Y)}{\partial N}\right] = -
**> \frac{Y}{N^2} \end{equation}
**>
**> \begin{equation}
**> var(Y) = \frac{k}{k-1} \sum_{j=1}^k(\hat{Y}_j-\hat{Y}_{..})^2
**> \end{equation}
**>
**> \begin{equation}
**> \hat{Y}_j = \sum_{i=1}^{n_j}\hat{Y}_{j(i)} \end{equation}
**>
**> \begin{equation}
**> \hat{Y}_{..} = k^{-1} \sum_{j=1}^k \hat{Y}_j \end{equation}
**>
**> \begin{equation}
**> var(N) = \frac{k}{k-1} \sum_{j=1}^k(\hat{N}_j-\hat{N}_{..})^2
**> \end{equation}
**>
**> \begin{equation}
**> \hat{N}_j = \sum_{i=1}^{n_j}\hat{N}_{j(i)} \end{equation}
**>
**> \begin{equation}
**> \hat{N}_{..} = k^{-1} \sum_{j=1}^k \hat{N}_j \end{equation}
**>
**> \begin{equation}
**> cov(Y,N) = \sum_{j=1}^k(\hat{Y}_j- \hat{Y}_{..}) (\hat{N}_j-
**> \hat{N}_{..}) \times \frac{k}{k-1}
**> \end{equation}
**>
**> \noindent where $j$ indexes cluster $(1, 2, \ldots, k)$,
**> $j(i)$ indexes the $i$th member of cluster $j$, and $n_j$ is
**> the total number of members in cluster $j$.
**>
**> The estimate of the variance of $f(Y)$ is then taken as:
**>
**> \begin{equation}
**> var(f(Y)) = \frac{N^2var(Y) - 2cov(Y,N)NY + var(N)Y^2 }{N^2}
**> \end{equation}
**>
**> The standard error is then taken as:
**>
**> \begin{equation}
**> se = \sqrt{var(f(Y))}
**> \end{equation}
**>
**> \end{document}
**>
**> > -----Original Message-----
**> > From: Stas Kolenikov [mailto:skolenik_at_gmail.com]
**> > Sent: Monday, August 18, 2008 10:40 AM
**> > To: Doran, Harold
**> > Cc: r-help_at_r-project.org
**> > Subject: Re: [R] Design-consistent variance estimate
**> >
**> > On 8/16/08, Doran, Harold <HDoran_at_air.org> wrote:
**> > > In terms of the "design" (which is a term used loosely)
**> the schools
**> > > were not randomly selected. They volunteered to participate
**> > in a pilot study.
**> >
**> > Oh, that's a next level of disaster, then! You may have to
**> work with
**> > treatment effect models, of which there are many:
**> > propensity score matching, nearest neighbor matching, instrumental
**> > variables, etc.
**> > Those methods require asymptotics in terms of number of treatment
**> > units, which would be schools -- and I would imagine those are
**> > numbered in dozens rather than thousands in your study, so
**> > straightforward application of those methods might be problematic...
**> > At least I would augment my analysis with propensity score weights:
**> > somehow estimate the (school level) probability of participating in
**> > the study (I imagine you have the school characteristics at
**> hand for
**> > your complete universe of schools
**> > -- principal's education level, # of computers per student,
**> fraction
**> > free/reduced price lunch, whatever...
**> > you probably know those better than I do :) ), and use
**> inverse of that
**> > probability as the probability weight. If the selection was
**> > informative, you might see quite different results in weighted and
**> > unweighted analysis.
**> >
**> > > In Wolter (1985) he shows the variance of a cluster sample with a
**> > > single strata and then extends that to the more general
**> example. It
**> > > turns out though in many educational assessment studies,
**> the single
**> > > stage cluster sample is a norm and not so rare.
**> >
**> > I can see why. Thanks, I'll keep educational statistics examples in
**> > mind for those kinds of designs!
**> >
**> > --
**> > Stas Kolenikov, also found at http://stas.kolenikov.name
**> Small print:
**> > I use this email account for mailing lists only.
**> >
**>
**> ______________________________________________
**> R-help_at_r-project.org mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide
**> http://www.R-project.org/posting-guide.html
**> and provide commented, minimal, self-contained, reproducible code.
**>
*

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 18 Aug 2008 - 15:04:06 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Mon 18 Aug 2008 - 15:33:52 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*