[R] problem with glm(family=binomial) when some levels have only 0 proportion values

From: Jürg Schulze <Juerg.Schulze_at_stud.unibas.ch>
Date: Wed, 02 Mar 2011 11:01:42 +0100


Hello everybody

I want to compare the proportions of germinated seeds (seed batches of size 10) of three plant types (1,2,3) with a glm with binomial data (following the method in Crawley: Statistics,an introduction using R, p.247).
The problem seems to be that in two plant types (2,3) all plants have proportions = 0.
I give you my data and the model I'm running:

   success failure type

  [1,]   0   10    3
  [2,]   0   10    2
  [3,]   0   10    2
  [4,]   0   10    2
  [5,]   0   10    2
  [6,]   0   10    2
  [7,]   0   10    2
  [8,]   4    6    1
  [9,]   4    6    1

[10,] 3 7 1
[11,] 5 5 1
[12,] 7 3 1
[13,] 4 6 1
[14,] 0 10 3
[15,] 0 10 3
[16,] 0 10 3
[17,] 0 10 3
[18,] 0 10 3
[19,] 0 10 3
[20,] 0 10 2
[21,] 0 10 2
[22,] 0 10 2
[23,] 9 1 1
[24,] 6 4 1
[25,] 4 6 1
[26,] 0 10 3
[27,] 0 10 3

  y<- cbind(success, failure)

  Call:
glm(formula = y ~ type, family = binomial)

Deviance Residuals:

        Min 1Q Median 3Q -1.3521849 -0.0000427 -0.0000427 -0.0000427

        Max
  2.6477556

Coefficients:

               Estimate Std. Error z value Pr(>|z|)
(Intercept)    0.04445    0.21087   0.211    0.833
typeFxC      -23.16283 6696.13233  -0.003    0.997
typeFxD      -23.16283 6696.13233  -0.003    0.997

(Dispersion parameter for binomial family taken to be 1)

     Null deviance: 134.395 on 26 degrees of freedom Residual deviance: 12.622 on 24 degrees of freedom AIC: 42.437 Number of Fisher Scoring iterations: 20

Huge standard errors are calculated and there is no difference between plant type 1 and 2 or between plant type 1 and 3. If I add 1 to all successes, so that all the 0 values disappear, the standard error becomes lower and I find highly significant differences between the plant types.

suc<- success + 1
fail<- 11 - suc
Y<- cbind(suc,fail)

Call:
glm(formula = Y ~ type, family = binomial)

Deviance Residuals:

        Min 1Q Median 3Q -1.279e+00 -4.712e-08 -4.712e-08 0.000e+00

        Max
  2.584e+00

Coefficients:

             Estimate Std. Error z value Pr(>|z|)
(Intercept)   0.2231     0.2023   1.103     0.27
typeFxC      -2.5257     0.4039  -6.253 4.02e-10 ***
typeFxD      -2.5257     0.4039  -6.253 4.02e-10 ***
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

(Dispersion parameter for binomial family taken to be 1)

     Null deviance: 86.391  on 26  degrees of freedom
Residual deviance: 11.793  on 24  degrees of freedom
AIC: 76.77

Number of Fisher Scoring iterations: 4


So I think the 0 values of all plants of group 2 and 3 are the  
problem, do you agree?
I don't know why this is a problem, or how I can explain to a reviewer  
why a data transformation (+ 1) is necessary with such a dataset.

I would greatly appreciate any comments.
Juerg
______________________________________

Jürg Schulze
Department of Environmental Sciences
Section of Conservation Biology
University of Basel
St. Johanns-Vorstadt 10
4056 Basel, Switzerland
Tel.: ++41/61/267 08 47

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Wed 02 Mar 2011 - 10:20:48 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 02 Mar 2011 - 16:10:20 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive