Re: [R] Overdispersion in count data

From: David Winsemius <>
Date: Thu, 03 Apr 2008 01:24:23 +0000 (UTC)

"Wade Wall" <> wrote in

> Thanks for the recommendations, insights.  I tried using glm.nb, but
> it didn't seem to like my data.  I received the message (subscript)
> logical subscript too long.  I am using the same dataframe as my
> previous glm.  Do you know if I need to put the data in a different
> format? 

I was wondering about your data layout. You said you had the flower/noflower  data in two different columns. That is not the way I usually offer data to glm(). I would have imagined that log(burn_time) would have been an offset. It might help if you at least offered the audience a sample of ten rows, the results of str() for the data.frame, and the call to the glm function.

David Winsemius

> On Wed, Apr 2, 2008 at 12:31 PM, Gavin Simpson
> <> wrote:

>> On Wed, 2008-04-02 at 12:03 -0400, Wade Wall wrote:
>> > Hi all,
>> >
>> > I have count data (number of flowering individuals plus total
>> > number of individuals) across 24 sites and 3 treatments (time
>> > since last burn). Following recommendations in the R Book, I used
>> > a glm with the model y~ burn, with y being two columns
>> > (flowering, not flowering) and burn the time (category) since
>> > burn. However, the residual deviance is roughly 10 times
>> > the number of degrees of freedom, and using the quasibinomial
>> > distribution doesn't change this. Any suggestions as to why the
>> > quasibinomial distribution doesn't change the residual deviance
>> > and how I should proceed.
>> > I know that this level of residual deviance is unacceptable, but
>> > not sure is transformations are in order.

>> The quasi families estimate the dispersion parameter rather than
>> assume it is fixed. This doesn't change the estimates for the
>> coefficients, but it may change their standard errors if the
>> estimated dispersion parameter is different from 1, and hence the
>> test statistics and their p-values. As such the residual deviance
>> doesn't change, you are just adjusting the interpretation of
>> coefficients to take account of the over-dispersion.
>> If you are not happy with the fitted model there are numerous
>> options you could try, including fitting a negative binomial (NB)
>> GLM (see glm.nb() in package MASS) or a zero-inflated Poisson or NB
>> model or a Hurdle model. Functions to fit the ZIP/ZINB or Hurdle
>> models can be found in the pscl package.
>> ______________________________________________ mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code.
Received on Thu 03 Apr 2008 - 01:27:30 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 03 Apr 2008 - 14:30:26 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive