Re: [R] analyzing binomial data with spatially correlated errors

From: Douglas Bates <>
Date: Thu, 20 Mar 2008 08:08:10 -0500

On Wed, Mar 19, 2008 at 3:02 PM, Ben Bolker <> wrote:
> Jean-Baptiste Ferdy <Jean-Baptiste.Ferdy <at>> writes:
> >
> > Dear R users,
> >
> > I want to explain binomial data by a serie of fixed effects. My problem is
> > that my binomial data are spatially correlated. Naively, I thought I could
> > found something similar to gls to analyze such data. After some reading, I
> > decided that lmer is probably to tool I need. The model I want to fit would
> > look like
> >
> > lmer ( cbind(n.success,n.failure) ~ (x1 + x2 + ... + xn)^2 , family=binomial,
> > correlation=corExp(1,form=~longitude+latitude))
> >
> > This doesn't work because lmer says it needs a random effect in the model.
> > And, apart from the spatial random effect that I want to capture by computing
> > the correlation matrix, I have no other random effect.
> >
> > There must be something I do not understand here... I can't get why gls can do
> > this on gaussian data but lmer can't on binomial ones.
> >
> This is a hard problem. The proximal issue is that lmer does not yet
> include a correlation term (I'm a little surprised you didn't get an
> error to that effect), and won't for some time since it is still in heavy
> development for more basic features.

The lmer function does have a ... argument that will swallow up the correlation argument (and proceed to ignore it). I think there are advantages to including a ... argument but one of the disadvantages is this quietly ignoring arguments that are not defined in the function (and are not mentioned in any documentation about the function).

> If your data were normal you could
> use gls from the nlme package, but nlme doesn't do generalized LMMs
> (only LMMs and NLMMs). You could *almost* use glmmPQL from the MASS package,
> which allows you to fit any lme model structure
> within a GLM 'wrapper', but as far as I know it wraps only lme (
> which requires at least one random effect) and not gls.

I don't think it is at all trivial to define a model for a binary or binomial response with a spatial correlation structure. It may be well-known; it's just that I don't know how to do it easily. I just saw the post by Roger Bivand who gave a reference on one approach (although usually when one finds oneself using something like a group factor with only one level to define the random effects it is a sign that something suspicious is underway. Expecting to estimate a variance from a single observation should raise a few red flags.)

The way Jose and I approached correlation structures in lme is to pre-whiten the data and the model matrices, conditional on the parameters that determine the (additional) marginal correlation of the responses. That works when the conditional distribution of the response is normal. I don't see how it could work for a binary or binomial response. A linear combination of normals is normal. A non-trivial linear combination of binomials is not binomial.

In writing lmer I have found that I must think about the model very carefully before I can determine a suitable computational method. I spent most of the month of January staring at a sequence of transformations on the whiteboard in my office trying to determine what should go where and how to implement the whole chain in data structures and algorithms.

The normal distribution and linear predictors fit together in such a way that one can factor out correlation structures or variance functions. Get away from the normal distribution and things start to break.

Think of the typical way in which we write a linear model:

y = X b + e

where y is an n-dimensional response, X is an n by p model matrix, b is a p-dimensional coefficient vector to be estimated and e is the "noise" term with a multivariate normal distribution that has mean 0. Now, try to write a generalized linear model that way. It doesn't work. You must express a GLM differently, relating the mean to the liner predictor, and taking into account the way the variance relates to the mean.

This is more than a notational difference. In a linear model the effect of b is limited to the linear predictor and, through that, the mean. The variance-covariance specification can be separated from the mean and, hence, can be specified separately. It is easy to fool yourself into thinking that the same should be true for generalized linear models, just like it is easy to fool yourself into thinking that all the arguments for the lme function will work unchanged in lmer.

> You could try gee or geoRglm -- neither trivially easy, I think ...
> Ben Bolker
> ______________________________________________
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.
> mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Thu 20 Mar 2008 - 13:12:46 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 20 Mar 2008 - 15:30:24 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive