Re: [R] logistic mixed effects models with lmer

From: Douglas Bates <bates_at_stat.wisc.edu>
Date: Fri, 28 Dec 2007 13:55:33 -0600

On Dec 28, 2007 9:35 AM, Sharon Goldwater <sgwater_at_stanford.edu> wrote:
> I have a question about some strange results I get when using lmer to
> build a logistic mixed effects model. I have a data set of about 30k
> points, and I'm trying to do backwards selection to reduce the number
> of fixed effects in my model. I've got 3 crossed random effects and
> about 20 or so fixed effects. At a certain point, I get a model (m17)
> where the fixed effects are like this (full output is at end of
> message):
>
> > print(m17, corr=F)
> ...
> Fixed effects:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -1.97887 0.19699 -10.045 < 2e-16 ***
> sexM 0.45553 0.14387 3.166 0.001544 **
> ...
> is_discTRUE 0.24676 0.15204 1.623 0.104576
> poly(wfreq, 2)1 -119.72397 11.00516 -10.879 < 2e-16 ***
> poly(wfreq, 2)2 17.35646 5.44456 3.188 0.001433 **
> poly(wlen_p, 2)1 -13.60798 7.26926 -1.872 0.061208 .
> poly(wlen_p, 2)2 -6.43167 5.24119 -1.227 0.219770
> ...
>
> where poly(wlen_p,2)2 is the least significant factor left in the
> model. So I then build a model (m18) with exactly the same random and
> fixed effects except removing poly(wlen_p,2)2. Then I do an anova,
> and I get:
>
> > anova(m17,m18)
> Data:
> Models:
> m18: is_err ~ sex + starts_turn + before_hes + after_hes + before_part +
> m17: after_part + first_rep + is_open + is_disc + poly(wfreq,
> m18: 2) + wlen_p + poly(utt_rate, 2) + poly(dur, 2) + pmean +
> m17: poly(log_prange, 2) + poly(imean, 2) + poly(irange, 2) +
> m18: (1 | speaker) + (1 | corpus) + (1 | ref)
> m17: is_err ~ sex + starts_turn + before_hes + after_hes + before_part +
> m18: after_part + first_rep + is_open + is_disc + poly(wfreq,
> m17: 2) + poly(wlen_p, 2) + poly(utt_rate, 2) + poly(dur, 2) +
> m18: pmean + poly(log_prange, 2) + poly(imean, 2) + poly(irange,
> m17: 2) + (1 | speaker) + (1 | corpus) + (1 | ref)
> Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
> m18 27 25928 26153 -12937
> m17 28 25925 26159 -12934 5.2136 1 0.02241 *

> So my first question is: Should I be concerned that the significance
> level shown in the original m17 is so different from the one shown by
> the anova? It's hard for me to see how this could happen. I noticed
> that there is a post on the FAQ about significance levels in linear
> mixed models, but I'm not sure whether it applies to the logistic
> case and if so how.

It can happen that the significance levels reported by the different tests will be different. The test in the summary is based on a local approximation to the conditional mean and assumes that the variances of the random effects will not change substantially when fitting the model with and without that term. Did they?

One thing I noticed is your output (and thank you for including that) is that it reports that corpus has only two levels. Is that correct? If so, I would not advise using a random effect for corpus. It is very difficult to estimate a variance from only two levels of a factor. I suggest using a fixed effect for corpus instead.

> Now, my second question is a result of removing one more factor
> (is_disc) from the model, creating m19. I do another anova:
>
> > anova(m19,m18)
> Data:
> Models:
> m19: is_err ~ sex + starts_turn + before_hes + after_hes + before_part +
> m18: after_part + first_rep + is_open + poly(wfreq, 2) + wlen_p +
> m19: poly(utt_rate, 2) + poly(dur, 2) + pmean + poly(log_prange,
> m18: 2) + poly(imean, 2) + poly(irange, 2) + (1 | speaker) + (1 |
> m19: corpus) + (1 | ref)
> m18: is_err ~ sex + starts_turn + before_hes + after_hes + before_part +
> m19: after_part + first_rep + is_open + is_disc + poly(wfreq,
> m18: 2) + wlen_p + poly(utt_rate, 2) + poly(dur, 2) + pmean +
> m19: poly(log_prange, 2) + poly(imean, 2) + poly(irange, 2) +
> m18: (1 | speaker) + (1 | corpus) + (1 | ref)
> Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
> m19 26 25925 26142 -12936
> m18 27 25928 26153 -12937 0 1 1
>
> and now it seems that m19 (which contains fewer parameters than m18)
> is a better fit. I don't see how it's possible to remove parameters
> from a model and get a better likelihood, but I will confess that I
> don't entirely understand how these kinds of models are estimated.
> Does this have something to do with approximations that R is making to
> fit the models, or numerical rounding errors? Could either problem be
> due to correlations among variables? All my fixed effects are either
> binary or numeric, and there are some fairly high correlations between
> a few pairs of them (maybe as high as .6 or .65 using Kendall tau) but I
> figured that this would be ok given the large number of data points.
> I think each value of each binary feature is observed at least
> 70-100 times.

The difference could be due to approximations or due to poor convergence. For some models the Laplace approximation can be generalized to a higher-order approximation, called adaptive Gauss-Hermite quadrature (AGQ), but that isn't feasible when you have crossed random effects (and, besides, I still haven't written the code for AGQ even in the case where you don't have crossed random effects).

First I suggest that you check the value stored as the log-likelihood

dput(logLik(m17))
dput(logLik(m18))

to see how different they really are.

You could, if you are feeling brave, try the development version of the lme4 package from http://r-forge.r-project.org. To do that, however, you will need to install a new version of R (R-2.6.1 is preferred) and a new version of the Matrix package. Then you can install the new lme4 with

install.packages("lme4", repos = "http://r-forge.r-project.org")

If you prefer, you can contact me off-list and I can try to fit your models for you using the development version.

> In case it matters, I'm running R 2.5.1 on Linux, with lme4 0.99875-8, and
> below is a full printout of all my output:
>
> > m17 <- lmer(is_err ~ sex + starts_turn + before_hes + after_hes + before_part + after_part + first_rep + is_open + is_disc + poly(wfreq,2) + poly(wlen_p,2) + poly(utt_rate,2) + poly(dur,2) + pmean + poly(log_prange,2) + poly(imean,2) + poly(irange,2)+(1|speaker)+(1|corpus)+(1|ref), x=T,y=T,family="binomial")
>
> > print(m17,corr=F)
> Generalized linear mixed model fit using Laplace
> Formula: is_err ~ sex + starts_turn + before_hes + after_hes +
> before_part + after_part + first_rep + is_open + is_disc +
> poly(wfreq, 2) + poly(wlen_p, 2) + poly(utt_rate, 2) + poly(dur,
> 2) + pmean + poly(log_prange, 2) + poly(imean, 2) + poly(irange,
> 2) + (1 | speaker) + (1 | corpus) + (1 | ref)
> Family: binomial(logit link)
> AIC BIC logLik deviance
> 25925 26159 -12934 25869
> Random effects:
> Groups Name Variance Std.Dev.
> ref (Intercept) 0.434599 0.65924
> speaker (Intercept) 0.327813 0.57255
> corpus (Intercept) 0.039722 0.19930
> number of obs: 31017, groups: ref, 2601; speaker, 72; corpus, 2
>
> Estimated scale (compare to 1 ) 0.9810366
>
> Fixed effects:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -1.97887 0.19699 -10.045 < 2e-16 ***
> sexM 0.45553 0.14387 3.166 0.001544 **
> starts_turnTRUE 0.25426 0.08087 3.144 0.001666 **
> before_hesTRUE 0.50943 0.12364 4.120 3.79e-05 ***
> after_hesTRUE 0.26439 0.12162 2.174 0.029712 *
> before_partTRUE 1.07972 0.10748 10.046 < 2e-16 ***
> after_partTRUE 0.47089 0.12363 3.809 0.000140 ***
> first_repTRUE 1.00673 0.13073 7.701 1.35e-14 ***
> is_openTRUE -0.42213 0.09894 -4.267 1.98e-05 ***
> is_discTRUE 0.24676 0.15204 1.623 0.104576
> poly(wfreq, 2)1 -119.72397 11.00516 -10.879 < 2e-16 ***
> poly(wfreq, 2)2 17.35646 5.44456 3.188 0.001433 **
> poly(wlen_p, 2)1 -13.60798 7.26926 -1.872 0.061208 .
> poly(wlen_p, 2)2 -6.43167 5.24119 -1.227 0.219770
> poly(utt_rate, 2)1 5.21605 3.56227 1.464 0.143126
> poly(utt_rate, 2)2 22.75593 3.04234 7.480 7.45e-14 ***
> poly(dur, 2)1 -110.73779 6.15641 -17.987 < 2e-16 ***
> poly(dur, 2)2 38.71283 3.52495 10.983 < 2e-16 ***
> pmean 1.00184 0.17912 5.593 2.23e-08 ***
> poly(log_prange, 2)1 3.71200 3.76913 0.985 0.324702
> poly(log_prange, 2)2 17.07179 2.79954 6.098 1.07e-09 ***
> poly(imean, 2)1 -21.73475 4.75449 -4.571 4.84e-06 ***
> poly(imean, 2)2 28.22247 3.32294 8.493 < 2e-16 ***
> poly(irange, 2)1 -6.47276 3.95778 -1.635 0.101955
> poly(irange, 2)2 6.99340 3.11937 2.242 0.024966 *
>
> #remove wlen^2
> > m18 <- lmer(is_err ~ sex + starts_turn + before_hes + after_hes + before_part + after_part + first_rep + is_open + is_disc + poly(wfreq,2) + wlen_p + poly(utt_rate,2) + poly(dur,2) + pmean + poly(log_prange,2) + poly(imean,2) + poly(irange,2)+(1|speaker)+(1|corpus)+(1|ref), x=T,y=T,family="binomial")
>
> > anova(m17,m18)
> Data:
> Models:
> m18: is_err ~ sex + starts_turn + before_hes + after_hes + before_part +
> m17: after_part + first_rep + is_open + is_disc + poly(wfreq,
> m18: 2) + wlen_p + poly(utt_rate, 2) + poly(dur, 2) + pmean +
> m17: poly(log_prange, 2) + poly(imean, 2) + poly(irange, 2) +
> m18: (1 | speaker) + (1 | corpus) + (1 | ref)
> m17: is_err ~ sex + starts_turn + before_hes + after_hes + before_part +
> m18: after_part + first_rep + is_open + is_disc + poly(wfreq,
> m17: 2) + poly(wlen_p, 2) + poly(utt_rate, 2) + poly(dur, 2) +
> m18: pmean + poly(log_prange, 2) + poly(imean, 2) + poly(irange,
> m17: 2) + (1 | speaker) + (1 | corpus) + (1 | ref)
> Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
> m18 27 25928 26153 -12937
> m17 28 25925 26159 -12934 5.2136 1 0.02241 *
>
> #remove is_disc
> > m19 <- lmer(is_err ~ sex + starts_turn + before_hes + after_hes + before_part + after_part + first_rep + is_open + poly(wfreq,2) + wlen_p + poly(utt_rate,2) + poly(dur,2) + pmean + poly(log_prange,2) + poly(imean,2) + poly(irange,2)+(1|speaker)+(1|corpus)+(1|ref), x=T,y=T,family="binomial")
>
> > anova(m19,m18)
> Data:
> Models:
> m19: is_err ~ sex + starts_turn + before_hes + after_hes + before_part +
> m18: after_part + first_rep + is_open + poly(wfreq, 2) + wlen_p +
> m19: poly(utt_rate, 2) + poly(dur, 2) + pmean + poly(log_prange,
> m18: 2) + poly(imean, 2) + poly(irange, 2) + (1 | speaker) + (1 |
> m19: corpus) + (1 | ref)
> m18: is_err ~ sex + starts_turn + before_hes + after_hes + before_part +
> m19: after_part + first_rep + is_open + is_disc + poly(wfreq,
> m18: 2) + wlen_p + poly(utt_rate, 2) + poly(dur, 2) + pmean +
> m19: poly(log_prange, 2) + poly(imean, 2) + poly(irange, 2) +
> m18: (1 | speaker) + (1 | corpus) + (1 | ref)
> Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
> m19 26 25925 26142 -12936
> m18 27 25928 26153 -12937 0 1 1



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 28 Dec 2007 - 19:58:06 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 28 Dec 2007 - 20:30:21 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.