[R] lmer p-vales are sometimes too small

From: Olof Leimar <olof.leimar_at_zoologi.su.se>
Date: Sat 07 Jan 2006 - 02:27:31 EST


This concerns whether p-values from lmer can be trusted. From simulations, it seems that lmer can produce very small, and probably spurious, p-values. I realize that lmer is not yet a finished product. Is it likely that the problem will be fixed in a future release of the lme4 package?

Using simulated data for a quite standard mixed-model anova (a balanced two-way design; see code for the function SimMixed pasted below), I compared the output of lmer, for three slightly different models, with the output of aov. For an example where there is no fixed treatment effect (null hypothesis is true), with 4 blocks, 2 treatments, and 40 observations per treatment-block combination, I find that lmer gives more statistical significances than it should, whereas aov does not have this problem. An example of output I generated by calling   > SimMixed(1000)
is the following:

Proportion significances at the 0.05 level

aov:     0.05
lmer.1:  0.148
lmer.2:  0.148
lmer.3:  0.151

Proportion significances at the 0.01 level
aov:     0.006
lmer.1:  0.076

lmer.2: 0.076
lmer.3: 0.077

Proportion significances at the 0.001 level

aov:     0.001
lmer.1:  0.047
lmer.2:  0.047
lmer.3:  0.047

which is based on 1000 simulations (and takes about 5 min on my PowerMac G5). The different models fitted are:

fm.aov <- aov(y ~ Treat + Error(Block/Treat), data = dat)
fm.lmer.1 <- lmer(y ~ Treat + (Treat|Block), data = dat)
fm.lmer.2 <- lmer(y ~ Treat + (Treat-1|Block), data = dat)
fm.lmer.3 <- lmer(y ~ Treat + (1|Block) + (Treat-1|Block), data = dat)

It seems that, depending on the level of the test, lmer gives between a factor of 3 to a factor of around 50 times too many significances. The first two lmer models seem to give identical results, whereas the third (which I think perhaps is the one that best represents the data generated by the simulation) differs slightly. In running the simulations, warnings like this are occasionally generated:

Warning message:
optim or nlminb returned message false convergence (8)   in: "LMEoptimize<-"(`*tmp*`, value = list(maxIter = 200, tolerance = 1.49011611938477e-08,

They seem to derive from the third of the lmer models. Perhaps there is some numerical issue in the lmer function? From running SimMixed() several times, I have noticed that large p-values (say, larger than 0.5) agree very well between lmer and aov, but there seems to be a systematic discrepancy for smaller p-values, where lmer gives smaller values than aov. The F-values agree between all analyzes (except for fm.lmer.3 when there is a warning), so there is a systematic difference between lmer and aov in how a p-value is obtained from the F-value, which becomes severe for small p-values.

My output from sessionInfo()

R version 2.2.1, 2005-12-20, powerpc-apple-darwin7.9.0

attached base packages:
[1] "methods" "stats" "graphics" "grDevices" "utils" "datasets" "base"

other attached packages:

      lme4 lattice Matrix
  "0.98-1" "0.12-11" "0.99-3"

Pasted code for the SimMixed function (some lines might wrap):

# This function generates n.sims random data sets for a design with 4
# blocks, 2 treatments applied to each block, and 40 replicate
# observations for each block-treatment combination. There is no true
# fixed treatment effect, so a statistical significance of a test for
# a fixed treatment effect ought to occur with a probability equal to
# the nominal level of the test. Four tests are applied to each
# simulated data set: the classical aov and three versions of lmer,
# corresponding to different model formulations. The proportion of
# tests for a fixed treatment effect that become significant at the
# 0.05 0.01 and 0.001 levels are printed, as well as the p-values for
# the last of the simulations. In my runs, lmer gives significance
# more often than indicated by the nominal level, for each of the
# three models, whereas aov is OK. The package lme4 needs to be loaded
# to run the code.

SimMixed <- function(n.sims = 1) {

   k <- 4                # number of blocks
   n <- 40               # num obs per block X treatment combination
   m1 <- 1.0             # fixed effect of level 1 of treatment
   m2 <- m1              # fixed effect of level 2 of treatment
   sd.block <- 0.5       # SD of block random effect
   sd.block.trt <- 1.0   # SD of random effect for block X treatm
   sd.res <- 0.1         # Residual SD
   Block <- factor( rep(1:k, each=2*n) )
   Treat <- factor( rep( rep(c("Tr1","Tr2"), k), each=n) )    m <- rep( rep(c(m1, m2), k), each=n) # fixed effects    # storage for p-values
   p.aov <- rep(0, n.sims)
   p.lmer.1 <- rep(0, n.sims)
   p.lmer.2 <- rep(0, n.sims)
   p.lmer.3 <- rep(0, n.sims)
   for (i in 1:n.sims) {
     # first get block and treatment random deviations
     b <- rep( rep(rnorm(k, 0, sd.block), each=2) +
              rnorm(2*k, 0, sd.block.trt), each=n )
     # then get response
     y <- m + b + rnorm(2*k*n, 0, sd.res)
     dat <- data.frame(Block, Treat, y)
     # perform the tests
     fm.aov <- aov(y ~ Treat+Error(Block/Treat), data = dat)
     fm.lmer.1 <- lmer(y ~ Treat+(Treat|Block), data = dat)
     fm.lmer.2 <- lmer(y ~ Treat+(Treat-1|Block), data = dat)
     fm.lmer.3 <- lmer(y ~ Treat+(1|Block)+(Treat-1|Block), data = dat)
     # store the p-values
     p.aov[i] <- summary(fm.aov)$"Error: Block:Treat"[[1]]$"Pr(>F)"[1]
     p.lmer.1[i] <- anova(fm.lmer.1)[6]
     p.lmer.2[i] <- anova(fm.lmer.2)[6]
     p.lmer.3[i] <- anova(fm.lmer.3)[6]

   }
   cat("\nProportion significances at the 0.05 level \n")
   cat("aov:    ", sum(p.aov<0.05)/n.sims, "\n")
   cat("lmer.1: ", sum(p.lmer.1<0.05)/n.sims, "\n")
   cat("lmer.2: ", sum(p.lmer.2<0.05)/n.sims, "\n")
   cat("lmer.3: ", sum(p.lmer.3<0.05)/n.sims, "\n")

   cat("\nProportion significances at the 0.01 level \n")
   cat("aov:    ", sum(p.aov<0.01)/n.sims, "\n")
   cat("lmer.1: ", sum(p.lmer.1<0.01)/n.sims, "\n")
   cat("lmer.2: ", sum(p.lmer.2<0.01)/n.sims, "\n")    cat("lmer.3: ", sum(p.lmer.3<0.01)/n.sims, "\n")
   cat("\nProportion significances at the 0.001 level \n")
   cat("aov:    ", sum(p.aov<0.001)/n.sims, "\n")
   cat("lmer.1: ", sum(p.lmer.1<0.001)/n.sims, "\n")
   cat("lmer.2: ", sum(p.lmer.2<0.001)/n.sims, "\n")
   cat("lmer.3: ", sum(p.lmer.3<0.001)/n.sims, "\n")

   cat("\nFinal aov analysis: \n")
   print(summary(fm.aov)$"Error: Block:Treat")    cat("\nFinal lmer analysis 1: \n")
   print(anova(fm.lmer.1))
   cat("\nFinal lmer analysis 2: \n")
   print(anova(fm.lmer.2))
   cat("\nFinal lmer analysis 3: \n")
   print(anova(fm.lmer.3))
}

-- 
Olof Leimar, Professor
Department of Zoology
Stockholm University
SE-106 91 Stockholm
Sweden

olof.leimar@zoologi.su.se
http://www.zoologi.su.se/research/leimar/

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Sat Jan 07 02:39:55 2006

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:41:54 EST