Re: [R] standardized residuals (random effects) using nlme and ranef

From: Douglas Bates <bates_at_stat.wisc.edu>
Date: Tue 01 Aug 2006 - 02:48:39 EST

Thank you for providing the reproducible example and the explanation of what you are seeking to calculate.

On 7/31/06, Dirk Enzmann <dirk.enzmann@uni-hamburg.de> wrote:
> As suggested I try another post.
>
> First I give a reproducible example. The example data set has been
> provided by I. Plewis (1997), Statistics in Education. London: Arnold (I
> include the residuals obtained by MLWin):
>
> # ---------------------------------------------
>
> library(nlme)
>
> # Example data from Plewis
>
> BaseURL="http://www.ottersbek.de/"
> ds32 = read.fwf(paste(BaseURL,'ds32.dat',sep=''),
> widths=c(9,10,10,10,10,10),
> col.names=c('class','pupil','zcurric',
> 'curric','zmath1','math2'),
> colClasses=c('factor','factor','numeric',
> 'numeric','numeric','numeric'))
>
> # Random slopes model (p. 44):
> model.4b = lme(math2~zcurric,random=~zcurric|class,method='ML',data=ds32)
>
> # Random effects (intercept and slope residuals) of level 1 (class):
> RandEff = ranef(model.4b,level=1)
>
> # Results of MLWin (c300 = intercept residuals of level "class",
> # c301 = slope residuals of level "class",
> # c304 = standardized intercept residuals,
> # c305 = standardized slope residuals):
> MLWinRes = read.fwf(paste(BaseURL,'resid_ex32.txt',sep=''),
> widths=c(15,15,15,15),
> col.names=c('c300','c301','c304','c305'),
> colClasses=c('numeric','numeric','numeric','numeric'))
>
> # They are identical to the results of MLWin (c300, c301):
> round(cbind(RandEff,MLWinRes[,1:2]),5)
>
> # However, using option "standard" the results differ considerably:
> round(cbind(ranef(model.4b,level=1,standard=T),MLWinRes[,3:4]),5)

Your summation below of the cause of this inconsistency is correct. As Harold explains the "standardized" random effects returned by ranef in the nlme package are the "estimates" (actually the "predictors") of the random effects divided by the estimated standard deviation of those random effects, not by the estimated standard error as stated in the documentation. I will correct the documentation.

That is, "standardized random effects" in the nlme package are produced by dividing all the intercept random effects by the same number (2.9723) and all the slope random effects by 2.0014.
> rr1 <- ranef(model.4b)
> rr2 <- ranef(model.4b, standard = TRUE)
> rr1/rr2

          (Intercept)  zcurric
        8    2.972373 2.001491
       12    2.972373 2.001491
       17    2.972373 2.001491
       22    2.972373 2.001491
       23    2.972373 2.001491
       28    2.972373 2.001491
       29    2.972373 2.001491
       30    2.972373 2.001491
       31    2.972373 2.001491
       32    2.972373 2.001491
       33    2.972373 2.001491
       34    2.972373 2.001491
       35    2.972373 2.001491
       36    2.972373 2.001491
       37    2.972373 2.001491
       38    2.972373 2.001491
       39    2.972373 2.001491
       41    2.972373 2.001491
       42    2.972373 2.001491
       43    2.972373 2.001491
       44    2.972373 2.001491
       45    2.972373 2.001491
       46    2.972373 2.001491
       47    2.972373 2.001491

Another way of defining a standardization would be to use what Harold Doran and others call the "posterior variance-covariance" of the random effects. On reflection I think I would prefer the term "conditional variance" but I use "posterior" below.

Strictly speaking the random effects are not parameters in the model - they are unobserved random variables. Conditional on the values of the parameters in the model and on the observed data, the random effects are normally distributed with a mean and variance-covariance that can be calculated.

Influenced by Harold I added an optional argument "postVar" to the ranef extractor function for lmer models (but apparently I failed to document it - I will correct that). If you fit your model with lmer, as you do below, you can standardize the random effects according to the conditional variances by

> fm1 <- lmer(math2 ~ zcurric + (zcurric|class), ds32, method='ML')
> rr <- ranef(model.4b, postVar = TRUE)
> str(rr) # shows the structure of the object rr
Formal class 'ranef.lmer' [package "Matrix"] with 1 slots

  ..@ .Data:List of 1
  .. ..$ :`data.frame':	24 obs. of  2 variables:
  .. .. ..$ (Intercept): num [1:24]  0.860 -1.962 -1.105  4.631 -0.781 ...
  .. .. ..$ zcurric    : num [1:24]  0.2748 -1.3194  0.0841  0.5958  0.8080 ...
  .. .. ..- attr(*, "postVar")= num [1:2, 1:2, 1:24]  2.562 -0.585
-0.585  1.837  3.027 ...

The two sets of conditional standard deviations are

> sqrt(attr(rr1, "postVar")[1,1,])

 [1] 1.600589 1.739768 1.466261 1.542172 1.942629 1.506112 1.892815 1.665414
 [9] 1.377917 1.574244 1.755679 2.039292 1.887651 1.451916 1.732030 1.659786
[17] 1.987397 1.795273 1.542049 1.784441 1.629663 1.753223 1.585520 1.470161

> sqrt(attr(rr1, "postVar")[2,2,])
 [1] 1.355537 1.630075 1.506560 1.378282 1.790906 1.310406 1.341185 1.512708
 [9] 1.397492 1.808535 0.859909 1.867273 1.626857 1.669984 1.520560 1.489380
[17] 1.989123 1.326610 1.613399 1.074365 1.764128 1.455202 1.489311 1.942117

However, as you have seen, standardizing the conditional means by these conditional standard deviations still does not reproduce the results from MLWin.

If you can determine what is being calculated by MLWin or if you can determine that there is a bug in the lmer calculations I would be delighted to hear about it.

Thanks again for a very thorough report and for the reproducible example.

> # ---------------------------------------------
>
> From the RSiteSearch("MLWin") there are two hits that seem to be
> relevant: One of Douglas Bates
>
> http://finzi.psych.upenn.edu/R/Rhelp02a/archive/53097.html
>
> where he explains why he regards getting standard errors of the
> estimates of variance components as not being important. I think (but
> I'm not sure) that this implies that I should not use standardized
> residuals, as well.
>
> Secondly a post of Roel de Jong
>
> http://finzi.psych.upenn.edu/R/Rhelp02a/archive/62280.html
>
> However, I can't figure out how I could make use of his suggestion to
> obtain the standardized residuals I am looking for.
>
> A part of the answer has been given in an answer to my previous post by
> Harold Doran. He showed that the so called "standardized residuals"
> obtained by ranef() using the option "standard=T" are the residuals
> devided by the standard deviation of the random effects, not divided by
> the "corresponding standard error" as stated in the ranef() help - if I
> understood him correctly.
>
> In the multilevel list he also showed how to create a caterpillar plot
> using lmer() and the errbar() function of Hmisc (I modified his example
> to adapt it to my data):
>
> # ------------------------------------
>
> detach(package:nlme)
> library(Matrix)
> library(Hmisc)
>
> # Fit a sample model:
> fm1 = lmer(math2 ~ zcurric + (zcurric|class), ds32, method='ML',
> control=list(gradient=FALSE, niterEM=0))
>
> .Call("mer_gradComp", fm1, PACKAGE = "Matrix")
>
> # extract random effects:
> fm1.blup = ranef(fm1)$class
>
> #extract variances and multiple by scale factor:
> fm1.post.var = fm1@bVar$class * attr(VarCorr(fm1),"sc")**2
>
> # posterior variance of slope on fm1:
> fm1.post.slo = sqrt(fm1.post.var[2,2,])
>
> # Slopes:
> x = fm1.blup[,2]+outer(fm1.post.slo, c(-1.96,0,1.96))
>
> # This code creates a catepillar plot using the errbar function:
> x <- x[order(x[,2]) , ]
> print(errbar(1:dim(x)[1], x[,2],x[,3],x[,1],ylab='Slopes', xlab='Classes'))
> abline(h=0)
>
> # ------------------------------------
>
> Is it correct that I can obtain a respecive plot for the intercepts in a
> similar way?
>
> # ------------------------------------
>
> # posterior variance of intercept on fm1.post.int:
> fm1.post.int = sqrt(fm1.post.var[1,1,])
>
> # Intercepts:
> x = fm1.blup[,1]+outer(fm1.post.int, c(-1.96,0,1.96))
>
> # This code creates a catepillar plot using the errbar function
> x <- x[order(x[,2]) , ]
> print(errbar(1:dim(x)[1], x[,2],x[,3],x[,1],ylab='Intercepts',
> xlab='Classes'))
> abline(h=0)
>
> # ------------------------------------
>
> To sum up, I can't figure out how MLWin calculates the standardized
> residuals. But I understand that this is not a question for the R list.
> Nevertheless, it would help if someone could point me to some arguments
> why not to use them and stick to the results obtainable by ranef().
>
> Spencer Graves wrote:
>
> > Have you tried RSiteSearch("MLWin")? I just got 29 hits. I
> > wonder if any one of these might relate to your question?
> >
> > If you would like more help on this issue from this listserve,
> > please submit another post, preferably illustrating your question with
> > the simplest possible self-contained example that illustrates your
> > question, perhaps like the following:
> >
> > fm1.16 <- lme(distance~age, data=Orthodont[1:16,],
> > random=~age|Subject)
> >
> > Hope this helps.
> > Spencer Graves
> > p.s. PLEASE do read the posting guide
> > "www.R-project.org/posting-guide.html" and provide commented, minimal,
> > self-contained, reproducible code.
> >
> > Dirk Enzmann wrote:
> >
> >> Using ranef() (package nlme, version 3.1-75) with an 'lme' object I
> >> can obtain random effects for intercept and slope of a certain level
> >> (say: 1) - this corresponds to (say level 1) "residuals" in MLWin.
> >> Maybe I'm mistaken here, but the results are identical.
> >>
> >> However, if I try to get the standardized random effects adding the
> >> paramter "standard=T" to the specification of ranef(), the results
> >> differ considerably from the results of MLWin (although MLWin defines
> >> "standardized" in the same way as "divided by its estimated
> >> (diagnostic) standard error").
> >>
> >> Why do the results differ although the estimates (random effects and
> >> thus their variances) are almost identical? I noticed that lme() does
> >> not compute the standard errors of the variances of the random effects
> >> - for several reasons, but if this is true, how does ranef() calculate
> >> the standardized random effects (the help says: '"standardized" (i.e.
> >> divided by the corresponding estimated standard error)').
> >>
> >> Is there a way to obtain similar results as in MLWin (or: should I
> >> prefer the results of ranef() for certain reasons)?
> >>
> >> Dirk
> >>
> >> -----------------------------
> >> R version: 2.3.1 Patched (2006-06-21 r38367)
>
> --
> *************************************************
> Dr. Dirk Enzmann
> Institute of Criminal Sciences
> Dept. of Criminology
> Edmund-Siemers-Allee 1
> D-20146 Hamburg
> Germany
>
> phone: +49-(0)40-42838.7498 (office)
> +49-(0)40-42838.4591 (Billon)
> fax: +49-(0)40-42838.2344
> email: dirk.enzmann@uni-hamburg.de
> www:
> http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue Aug 01 06:34:02 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 01 Aug 2006 - 08:18:19 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.