Re: [R] Likelihood ratio test between glm and glmer fits

From: Dimitris Rizopoulos <Dimitris.Rizopoulos_at_med.kuleuven.be>
Date: Wed, 16 Jul 2008 20:22:05 +0200

well, for computing the p-value you need to use pchisq() and dchisq() (check ?dchisq for more info). For model fits with a logLik method you can directly use the following simple function:

lrt <- function (obj1, obj2) {

     L0 <- logLik(obj1)
     L1 <- logLik(obj2)
     L01 <- as.vector(- 2 * (L0 - L1))
     df <- attr(L1, "df") - attr(L0, "df")
     list(L01 = L01, df = df,
         "p-value" = pchisq(L01, df, lower.tail = FALSE))
}

library(lme4)
gm0 <- glm(cbind(incidence, size - incidence) ~ period,

               family = binomial, data = cbpp) gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),

               family = binomial, data = cbpp)

lrt(gm0, gm1)

However, there are some issues regarding this likelihood ratio test.

  1. The null hypothesis is on the boundary of the parameter space, i.e., you test whether the variance for the random effect is zero. For this case the assumed chi-squared distribution for the LRT may *not* be totally appropriate and may produce conservative p-values. There is some theory regarding this issue, which has shown that the reference distribution for the LRT in this case is a mixture of a chi-squared(df = 0) and chi-squared(df = 1). Another option is to use simulation-based approach where you can approximate the reference distribution of the LRT under the null using simulation. You may check below for an illustration of this procedure (not-tested):

X <- model.matrix(gm0)
coefs <- coef(gm0)
pr <- plogis(c(X %*% coefs))
n <- length(pr)
new.dat <- cbpp
Tobs <- lrt(gm0, gm1)$L01
B <- 200
out.T <- numeric(B)
for (b in 1:B) {

     y <- rbinom(n, cbpp$size, pr)
     new.dat$incidence <- y
     fit0 <- glm(formula(gm0), family = binomial, data = new.dat)
     fit1 <- glmer(formula(gm1), family = binomial, data = new.dat)
     out.T[b] <- lrt(fit0, fit1)$L01

}
# estimate p-value
(sum(out.T >= Tobs) + 1) / (B + 1)

2) For the glmer fit you have to note that you work with an approximation to the log-likelihood (obtained using numerical integration) and not the actual log-likelihood.

I hope it helps.

Best,
Dimitris



Dimitris Rizopoulos
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium

Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
      http://www.student.kuleuven.be/~m0390867/dimitris.htm


Quoting COREY SPARKS <corey.sparks_at_UTSA.EDU>:

> Dear list,
> I am fitting a logistic multi-level regression model and need to
> test the difference between the ordinary logistic regression from a
> glm() fit and the mixed effects fit from glmer(), basically I want
> to do a likelihood ratio test between the two fits.
>
>
> The data are like this:
> My outcome is a (1,0) for health status, I have several (1,0) dummy
> variables RURAL, SMOKE, DRINK, EMPLOYED, highereduc, INDIG, male,
> divorced, SINGLE, chronic, vigor_d and moderat_d and AGE is
> continuous (20 to 100).
> My higher level is called munid and has 581 levels.
> The data have 45243 observations.
>
> Here are my program statements:
>
> #GLM fit
> ph.fit.2<-glm(poorhealth~RURAL+SMOKE+DRINK+EMPLOYED+highereduc+INDIG+AGE+male+divorced+SINGLE+chronic+vigor_d+moderat_d,family=binomial(),
> data=mx.merge)
> #GLMER fit
> ph.fit.3<-glmer(poorhealth~RURAL+SMOKE+DRINK+EMPLOYED+INSURANCE+highereduc+INDIG+AGE+male+divorced+SINGLE+chronic+vigor_d+moderat_d+(1|munid),family=binomial(),
> data=mx.merge)
>
> I cannot find a method in R that will do the LR test between a glm
> and a glmer fit, so I try to do it using the liklihoods from both
> models
>
> #form the likelihood ratio test between the glm and glmer fits
> x2<--2*(logLik(ph.fit.2)-logLik(ph.fit.3))
>
>> ML
> 79.60454
> attr(,"nobs")
> n
> 45243
> attr(,"nall")
> n
> 45243
> attr(,"df")
> [1] 14
> attr(,"REML")
> [1] FALSE
> attr(,"class")
> [1] "logLik"
>
> #Get the associated p-value
> dchisq(x2,14)
> ML
>> 5.94849e-15
>
> Which looks like an improvement in model fit to me. Am I seeing
> this correctly or are the two models even able to be compared? they
> are both estimated via maximum likelihood, so they should be, I think.
> Any help would be appreciated.
>
> Corey
>
> Corey S. Sparks, Ph.D.
>
> Assistant Professor
> Department of Demography and Organization Studies
> University of Texas San Antonio
> One UTSA Circle
> San Antonio, TX 78249
> email:corey.sparks_at_utsa.edu
> web: https://rowdyspace.utsa.edu/users/ozd504/www/index.htm
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 16 Jul 2008 - 18:19:52 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 17 Jul 2008 - 09:31:50 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive