Re: [R] Conservative "ANOVA tables" in lmer

From: Douglas Bates <bates_at_stat.wisc.edu>
Date: Mon 18 Sep 2006 - 18:09:29 GMT

On 9/12/06, Douglas Bates <bates@stat.wisc.edu> wrote:
> On 9/11/06, Manuel Morales <Manuel.A.Morales@williams.edu> wrote:
[snip]
> > Am I right that the MCMC sample can not be used, however, to evaluate
> > the significance of parameter groups. For example, to assess the
> > significance of a three-level factor? Are there better alternatives than
> > simply adjusting the CI for the number of factor levels
> > (1-alpha/levels).
>
> Hmm - I'm not sure what confidence interval and what number of levels
> you mean there so I can't comment on that method.
>
> Suppose we go back to Spencer's example and consider if there is a
> signficant effect for the Nozzle factor. That is equivalent to the
> hypothesis H_0: beta_2 = beta_3 = 0 versus the general alternative. A
> "p-value" could be formulated from an MCMC sample if we assume that
> the marginal distribution of the parameter estimates for beta_2 and
> beta_3 has roughly elliptical contours and you can evaluate that by,
> say, examining a hexbin plot of the values in the MCMC sample. One
> could take the ellipses as defined by the standard errors and
> estimated correlation or, probably better, by the observed standard
> deviations and correlations in the MCMC sample. Then determine the
> proportion of (beta_2, beta_3) pairs in the sample that fall outside
> the ellipse centered at the estimates and with that eccentricity and
> scaling factors that passes through (0,0). That would be an empirical
> p-value for the test.
>
> I would recommend calculating this for a couple of samples to check on
> the reproducibility.

There was some follow-up on this thread, including some code and results that I find encouraging. I didn't notice that the R-help list had been dropped off the cc: in later exchanges. I enclose my contribution to the conversation so that others on the list will get to see it.

As soon as I described the idea I knew that someone would ask for a function to perform it. Here's one

mcmcpvalue <- function(samp)
{

   ## elementary version that creates an empirical p-value for the
   ## hypothesis that the columns of samp have mean zero versus a
   ## general multivariate distribution with elliptical contours.

   ## differences from the mean standardized by the observed    ## variance-covariance factor
   std <- backsolve(chol(var(samp)),

                    cbind(0, t(samp)) - colMeans(samp),
                    transpose = TRUE)

   sqdist <- colSums(std * std)
   sum(sqdist[-1] > sqdist[1])/nrow(samp) }

At least I think I have the standardization by the Cholesky factor of the observed variance-covariance matrix correct. However I always manage to confuse myself on that calculation so please let me know if I have it wrong.

As an example, consider a model fit to the AvgDailyGain data from the SASmixed package.
> library(nlme)
> data(AvgDailyGain, package = "SASmixed")
> summary(fm1Adg <- lme(adg ~ InitWt*Treatment - 1, AvgDailyGain, random = ~1|Block))
Linear mixed-effects model fit by REML
 Data: AvgDailyGain

      AIC BIC logLik
 85.32685 97.10739 -32.66342

Random effects:
 Formula: ~1 | Block

       (Intercept) Residual
StdDev: 0.5092266 0.2223268

Fixed effects: adg ~ InitWt * Treatment - 1

                       Value Std.Error DF    t-value p-value
InitWt              0.0022937 0.0017473 17  1.3126947  0.2067
Treatment0          0.4391370 0.7110881 17  0.6175564  0.5451
Treatment10         1.4261187 0.6375458 17  2.2368880  0.0390
Treatment20         0.4796285 0.5488867 17  0.8738206  0.3944
Treatment30         0.2001071 0.7751989 17  0.2581365  0.7994
InitWt:Treatment10 -0.0012108 0.0023326 17 -0.5190774  0.6104
InitWt:Treatment20 0.0010720 0.0021737 17 0.4931507 0.6282 InitWt:Treatment30 0.0021543 0.0027863 17 0.7731996 0.4500  Correlation:
                  InitWt Trtmn0 Trtm10 Trtm20 Trtm30 IW:T10 IW:T20
Treatment0         -0.961
Treatment10         0.034  0.039
Treatment20         0.003  0.080  0.334
Treatment30         0.050  0.011  0.097  0.043
InitWt:Treatment10 -0.772  0.742 -0.631 -0.164 -0.059
InitWt:Treatment20 -0.806 0.775 -0.180 -0.555 -0.019 0.724 InitWt:Treatment30 -0.666 0.640 -0.046 0.024 -0.754 0.529 0.520

Standardized Within-Group Residuals:

       Min Q1 Med Q3 Max -1.82903364 -0.44913967 -0.03023488 0.44738506 1.59877700

Number of Observations: 32
Number of Groups: 8
> anova(fm1Adg)

                numDF denDF  F-value p-value
InitWt               1    17 91.68230  <.0001
Treatment            4    17  8.81312  0.0005
InitWt:Treatment     3    17  0.93118  0.4471

Fitting the same model in lmer then generating an MCMC sample and testing for the three interaction coefficients being zero would look like

> data(AvgDailyGain, package = "SASmixed")
> (fm1Adg <- lmer(adg ~ InitWt*Treatment - 1 + (1|Block), AvgDailyGain))
Linear mixed-effects model fit by REML
Formula: adg ~ InitWt * Treatment - 1 + (1 | Block)   Data: AvgDailyGain
  AIC BIC logLik MLdeviance REMLdeviance  83.33 96.52 -32.66 10.10 65.33 Random effects:
 Groups Name Variance Std.Dev.
 Block (Intercept) 0.25930 0.50922
 Residual 0.04943 0.22233
number of obs: 32, groups: Block, 8

Fixed effects:

                   Estimate Std. Error t value
InitWt              0.002294   0.001747  1.3127
Treatment0          0.439128   0.711092  0.6175
Treatment10         1.426113   0.637549  2.2369
Treatment20         0.479621   0.548889  0.8738
Treatment30         0.200115   0.775204  0.2581
InitWt:Treatment10 -0.001211   0.002333 -0.5191
InitWt:Treatment20 0.001072 0.002174 0.4931 InitWt:Treatment30 0.002154 0.002786 0.7732

Correlation of Fixed Effects:

           InitWt Trtmn0 Trtm10 Trtm20 Trtm30 IW:T10 IW:T20

Treatment0  -0.961
Treatment10  0.034  0.039
Treatment20  0.003  0.080  0.334
Treatment30  0.050  0.011  0.097  0.043
IntWt:Trt10 -0.772  0.742 -0.631 -0.164 -0.059
IntWt:Trt20 -0.806  0.775 -0.180 -0.555 -0.019  0.724
IntWt:Trt30 -0.666  0.640 -0.046  0.024 -0.754  0.529  0.520

> AdgS1 <- mcmcsamp(fm1Adg, 50000)
> library(coda)
> HPDinterval(AdgS1)
                         lower        upper
InitWt             -0.001402986  0.006164517
Treatment0         -1.144313821  1.925351624
Treatment10         0.047128188  2.776309715
Treatment20        -0.761106539  1.602588258
Treatment30        -1.440932884  1.887536852
InitWt:Treatment10 -0.006309270  0.003569772
InitWt:Treatment20 -0.003578127  0.005614077
InitWt:Treatment30 -0.004003873  0.008055591
log(sigma^2)       -3.617259562 -2.198161379
log(Blck.(In))     -2.438121552  0.002976392
deviance           12.769059400 32.841699073
attr(,"Probability")
[1] 0.95
> mcmcpvalue(as.matrix(AdgS1)[, 6:8])

[1] 0.46876

If we drop the interaction term and consider the treatment term the test becomes

> (fm2Adg <- lmer(adg ~ InitWt + Treatment + (1|Block), AvgDailyGain))
Linear mixed-effects model fit by REML
Formula: adg ~ InitWt + Treatment + (1 | Block)   Data: AvgDailyGain
  AIC BIC logLik MLdeviance REMLdeviance  48.34 57.13 -18.17 13.62 36.34 Random effects:
 Groups Name Variance Std.Dev.
 Block (Intercept) 0.24084 0.49076
 Residual 0.05008 0.22379
number of obs: 32, groups: Block, 8

Fixed effects:

            Estimate Std. Error t value
(Intercept) 0.2490338  0.3776318   0.659
InitWt      0.0027797  0.0008334   3.336
Treatment10 0.4835075  0.1128530   4.284
Treatment20 0.4639445  0.1120505   4.140
Treatment30 0.5520737  0.1148132   4.808

Correlation of Fixed Effects:
           (Intr) InitWt Trtm10 Trtm20
InitWt      -0.863
Treatment10 -0.035 -0.130

Treatment20 -0.102 -0.053 0.502
Treatment30 -0.338 0.224 0.454 0.475
> AdgS2 <- mcmcsamp(fm2Adg, 50000)
> HPDinterval(AdgS2)
                     lower        upper
(Intercept)    -0.579312545  1.044946476
InitWt          0.001114375  0.004616652
Treatment10     0.246254015  0.714185069
Treatment20     0.220236717  0.686356392
Treatment30     0.316078702  0.797956531
log(sigma^2)   -3.553700311 -2.280238850
log(Blck.(In)) -2.448486301 -0.072657242
deviance       14.687593543 29.699825900
attr(,"Probability")
[1] 0.95
> mcmcpvalue(as.matrix(AdgS2[,3:5]))

[1] 0.00026

so these p-values seem to be in line with the results from the analysis of variance p-values using one of the formula for calculation of a residual degrees of freedom.

Related to the other question of the use of a likelihood ratio test for the fixed effects, the p-values for the likelihood ratio tests seem quite different

> anova(fm2Adg, fm1Adg)

Data: AvgDailyGain

Models:
fm2Adg: adg ~ InitWt + Treatment + (1 | Block)
fm1Adg: adg ~ InitWt * Treatment - 1 + (1 | Block)
      Df    AIC    BIC logLik  Chisq Chi Df Pr(>Chisq)
fm2Adg  6 25.623 34.417 -6.812
fm1Adg  9 28.098 41.290 -5.049 3.5249      3     0.3176

> fm3Adg <- lmer(adg ~ InitWt + (1|Block), AvgDailyGain)
> anova(fm2Adg, fm3Adg)

Data: AvgDailyGain

Models:
fm3Adg: adg ~ InitWt + (1 | Block)
fm2Adg: adg ~ InitWt + Treatment + (1 | Block)
      Df     AIC     BIC  logLik Chisq Chi Df Pr(>Chisq)
fm3Adg  3  41.913  46.310 -17.957
fm2Adg  6  25.623  34.417  -6.812 22.29      3  5.677e-05 ***

and it is not just a matter of the evaluation of the log-likelihood

> fm2AdgML <- update(fm2Adg, method = "ML")
> fm3AdgML <- update(fm3Adg, method = "ML")
> anova(fm3AdgML, fm2AdgML)

Data: AvgDailyGain
Models:
fm3AdgML: adg ~ InitWt + (1 | Block)
fm2AdgML: adg ~ InitWt + Treatment + (1 | Block)

        Df AIC BIC logLik Chisq Chi Df Pr(>Chisq) fm3AdgML 3 41.882 46.279 -17.941
fm2AdgML 6 25.617 34.412 -6.809 22.265 3 5.746e-05 ***



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue Sep 19 05:14:55 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Mon 18 Sep 2006 - 19:30:05 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.