# Re: [R] Components of variance

From: Douglas Bates <dmbates_at_gmail.com>
Date: Mon 27 Jun 2005 - 04:40:36 EST

On 6/26/05, Spencer Graves <spencer.graves@pdf.com> wrote:
> Hi, Renaud:
>
> Thanks for the link to Doug Bates' recent comment on this.
> Unfortunately, the information contained therein is not enough for me to
> easily obtain all the required standard errors and write code to make
> confidence intervals. To be more specific, I'm able to produce the same
> answers as before using lmer in place of lme as follows:
>
> > library(lme4)
> > set.seed(2)
> > lot <- rep(LETTERS[1:3], each=9)
> > lot.e <- rep(rnorm(3), each=9)
> > wf <- paste(lot, rep(1:9, each=3))
> > wf.e <- rep(rnorm(9), each=3)
> > DF <- data.frame(lot=lot, wafer=wf,
> + Vt=lot.e+wf.e+rnorm(27))
> > (fitr <- lmer(Vt~1+(1|lot)+(1|wafer), DF))
> <snip>
> Random effects:
> Groups Name Variance Std.Dev.
> wafer (Intercept) 0.96054 0.98007
> lot (Intercept) 1.51434 1.23058
> Residual 1.34847 1.16124
> # of obs: 27, groups: wafer, 9; lot, 3
>
> Fixed effects:
> Estimate Std. Error DF t value Pr(>|t|)
> (Intercept) 0.60839 0.81329 26 0.7481 0.4611
>
> These numbers agree with what I got with lme. I then followed your
> suggetion to "http://tolstoy.newcastle.edu.au/R/help/05/06/7291.html".
> This suggested I try str(VarCorr(fitr)):
>
> > vcFit <- VarCorr(fitr)
> > str(vcFit)
> Formal class 'VarCorr' [package "Matrix"] with 3 slots
> ..@ scale : num 1.16
> ..@ reSumry :List of 2
> .. ..\$ wafer:Formal class 'corrmatrix' [package "Matrix"] with 2 slots
> .. .. .. ..@ .Data : num [1, 1] 1
> .. .. .. .. ..- attr(*, "dimnames")=List of 2
> .. .. .. .. .. ..\$ : chr "(Intercept)"
> .. .. .. .. .. ..\$ : chr "(Intercept)"
> .. .. .. ..@ stdDev: Named num 0.844
> .. .. .. .. ..- attr(*, "names")= chr "(Intercept)"
> .. ..\$ lot :Formal class 'corrmatrix' [package "Matrix"] with 2 slots
> .. .. .. ..@ .Data : num [1, 1] 1
> .. .. .. .. ..- attr(*, "dimnames")=List of 2
> .. .. .. .. .. ..\$ : chr "(Intercept)"
> .. .. .. .. .. ..\$ : chr "(Intercept)"
> .. .. .. ..@ stdDev: Named num 1.06
> .. .. .. .. ..- attr(*, "names")= chr "(Intercept)"
> ..@ useScale: logi TRUE
> >
>
> Unfortunately, I've been so far unable to translate this into
> confidence intervals for the variance components that seem to match
> intervals(lme(...)). The closest I came was as follows:
>
> > 1.23058*sqrt(exp(qnorm(c(0.025, 0.975))*vcFit@reSumry\$lot@stdDev))
> [1] 0.4356044 3.4763815
> > 0.98007*sqrt(exp(qnorm(c(0.025, 0.975))*vcFit@reSumry\$wafer@stdDev))
> [1] 0.4286029 2.2410887
> >
> The discrepancy between these numbers and intervals(lme(...)) is 25%
> for "lot", which suggest that I'm doing something wrong. Moreover, I
> don't know how to get a similar interval for "scale", and I don't know
> how to access the estimated variance components other than manually.
>
> Thanks,
> spencer graves
> p.s. R 2.1.0 patched under Windows XP.
>
> Renaud Lancelot wrote:
>
> > Spencer Graves a écrit :
> >
> >> As far as I know, the best available is lme in library(nlme). For
> >>
> >> Pinheiro and Bates (2000) Mixed-Effects Models in S and S-Plus
> >>(Springer)
> >>
> >> Consider the following example:
> >>
> >> > set.seed(2)
> >> > lot <- rep(LETTERS[1:3], each=9)
> >> > lot.e <- rep(rnorm(3), each=9)
> >> > wf <- paste(lot, rep(1:9, each=3))
> >> > wf.e <- rep(rnorm(9), each=3)
> >> > DF <- data.frame(lot=lot, wafer=wf,
> >>+ Vt=lot.e+wf.e+rnorm(27))
> >> > (fit <- lme(Vt~1, random=~1|lot/wafer, DF))
> >>Linear mixed-effects model fit by REML
> >> Data: DF
> >> Log-restricted-likelihood: -48.44022
> >> Fixed: Vt ~ 1
> >>(Intercept)
> >> 0.6083933
> >>
> >>Random effects:
> >> Formula: ~1 | lot
> >> (Intercept)
> >>StdDev: 1.230572
> >>
> >> Formula: ~1 | wafer %in% lot
> >> (Intercept) Residual
> >>StdDev: 0.9801403 1.161218
> >>
> >>Number of Observations: 27
> >>Number of Groups:
> >> lot wafer %in% lot
> >> 3 9
> >> > (CI.fit <- intervals(fit))
> >>Approximate 95% confidence intervals
> >>
> >> Fixed effects:
> >> lower est. upper
> >>(Intercept) -1.100281 0.6083933 2.317068
> >>attr(,"label")
> >>[1] "Fixed effects:"
> >>
> >> Random Effects:
> >> Level: lot
> >> lower est. upper
> >>sd((Intercept)) 0.3368174 1.230572 4.495931
> >> Level: wafer
> >> lower est. upper
> >>sd((Intercept)) 0.426171 0.9801403 2.254201
> >>
> >> Within-group standard error:
> >> lower est. upper
> >>0.8378296 1.1612179 1.6094289
> >> > str(CI.fit)
> >>List of 3
> >> \$ fixed : num [1, 1:3] -1.100 0.608 2.317
> >> ..- attr(*, "dimnames")=List of 2
> >> .. ..\$ : chr "(Intercept)"
> >> .. ..\$ : chr [1:3] "lower" "est." "upper"
> >> ..- attr(*, "label")= chr "Fixed effects:"
> >> \$ reStruct:List of 2
> >> ..\$ lot :`data.frame': 1 obs. of 3 variables:
> >> .. ..\$ lower: num 0.337
> >> .. ..\$ est. : num 1.23
> >> .. ..\$ upper: num 4.5
> >> ..\$ wafer:`data.frame': 1 obs. of 3 variables:
> >> .. ..\$ lower: num 0.426
> >> .. ..\$ est. : num 0.98
> >> .. ..\$ upper: num 2.25
> >> ..- attr(*, "label")= chr "Random Effects:"
> >> \$ sigma : atomic [1:3] 0.838 1.161 1.609
> >> ..- attr(*, "label")= chr "Within-group standard error:"
> >> - attr(*, "level")= num 0.95
> >> - attr(*, "class")= chr "intervals.lme"
> >> > diff(log(CI.fit\$sigma))
> >> est. upper
> >>0.32641 0.32641
> >>
> >> The last line combined with help for intervals.lme shows that the
> >>confidence interval for sigma (and doubtless also for lot and wafer
> >>variance components is based on a normal approximation for the
> >>distribution of log(sigma).
> >>
> >> The state of the art is reflected in "lmer" in library(lme4),
> >>described in the following:
> >>
> >> Doug Bates (2005) "Fitting linear mixed models in R" in R News 5/1
> >>available from "www.r-project.org" -> Newsletter
> >>
> >> However, an "intervals" function is not yet available for "lmer"
> >>objects.
> >
> >
> > The issue of variance components in lme4 was recently by D. Bates:
> >
> > http://tolstoy.newcastle.edu.au/R/help/05/06/7291.html
> >
> > Also, a colleague wrote (in French) a nice report - with data and R code
> > on how to compute variance components, their bias and distribution with
> > lme (normal approximation , Monte Carlo simulation and delta method).
> > Let me know if you want it.
> >
> > Best,
> >
> > Renaud
> >
> >
> >
> >>
> >> spencer graves
> >>
> >>John Sorkin wrote:
> >>
> >>
> >>
> >>>Could someone identify a function that I might use to perform a
> >>>components of variance analysis? In addition to the variance
> >>>attributable to each factor, I would also like to obtain the SE of the
> >>>variances.
> >>>Thank you,
> >>>John
> >>>
> >>>John Sorkin M.D., Ph.D.
> >>>Chief, Biostatistics and Informatics
> >>>Baltimore VA Medical Center GRECC and
> >>>University of Maryland School of Medicine Claude Pepper OAIC
> >>>
> >>>University of Maryland School of Medicine
> >>>Division of Gerontology
> >>>Baltimore VA Medical Center
> >>>10 North Greene Street
> >>>GRECC (BT/18/GR)
> >>>Baltimore, MD 21201-1524
> >>>
> >>>410-605-7119
> >>>----- NOTE NEW EMAIL ADDRESS:
> >>>jsorkin@grecc.umaryland.edu

However, I do not regard getting standard errors of the estimates of variance components as being important. In fact I think I am doing a service to the community by *not* providing them because they are inevitably misused.

Think back to your first course in statistics when confidence intervals were introduced. You learned that a confidence interval on the mean of a population (normally distributed or reasonably close to normally distributed) is derived as the sample mean plus or minus a multiple of the standard error of the mean. What about a confidence interval on the variance of such a population? If you calculated that as the estimate of the variance plus or minus a multiple of the standard error of that estimate then you probably lost points on that part of the assignment or exam because we don't calculate a confidence interval in that way. We know that the distribution of S^2 is a scaled chi-squared distribution and often very far from symmetric and a confidence interval constructed in the manner described above would have poor properties and would not reflect the extent of our uncertainty about the parameter. And we never discussed at all a hypothesis test on whether the population variance was zero.

Now let's go to a much more complicated situation with mixed effects models. We know that in the simplest possible case it would be extremely misleading to create confidence intervals or perform hypothesis tests based on an estimate of a variance and a standard error of that estimate but we are perfectly happy to do so in the much more complicated situation of variance components or, even worse, estimates of covariances. It doesn't make sense. Even though SAS PROC MIXED and HLM and MLWiN and Stata and ... all produce such tests and such confidence intervals for mixed-effects models they still don't make sense. There is no good justification for their use other than a vague hand-waving about asymptotics.

Until someone can explain to me a good use of quantities such as standard errors of estimates of variances or covariances, I'm not going to stop working on all of the other aspects of the code and documentation and redirect my effort to producing these. If anyone wants to do it they can contact me off-list and I'll explain the optimization problem and how to get access to the parameterization and the gradient. If you are really energetic I can point you to the formulae for an analytic Hessian that I haven't yet had time to implement.

R-help@stat.math.ethz.ch mailing list