Re: [R] problems with glm

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Mon 16 Jan 2006 - 01:57:22 EST

On Sun, 15 Jan 2006, Roland R Regoes wrote:

> I am sorry, I should have emphasized that: I really mean 'family =
> binomial(link="log")'.
>
> I am fitting infection data. Failure and success correspond to
> being infected or not. The probability of success (ie, not being
> infected in my context) is derived from a population model describing
> the interaction between hosts and parasites:
> P(not infected) = exp(-InfectionRate*ParasiteDose)

> 'ParasiteDose' corresponds to 'predictor'. I want to estimate the
> infection rate. Hence I need 'binomial(link="log")' and must set
> intercept=0.

Then your model is not appropriate for your data. The only real evidence is coming from the last two points which have approximately equal failure rate yet differ by a factor of 5 in the predictor. Also, your model without intercept predicts probablilty one for the first case, and the log link in R does not allow probability zero. If we drop that case (which contributes nothing to the model) we get

success <- c(12,11,14,14,11,13,11,12)
failure <- c(0,0,0,0,0,0,2,2)
predictor <- 80*5^(0:7)
glm(cbind(success,failure) ~ predictor+0,

            family = binomial(link="log"),
            control = glm.control(epsilon=1e-8,trace=TRUE,maxit=50))

which works, albeit with an infinite null deviance (as the null model is inappropriate).

> On Mon, Jan 16, 2006 at 12:16:09AM +1100, Bill.Venables@csiro.au wrote:
>> Most of your problems seem to come from 'link = "log"' whereas you
>> probably mean 'link = logit' (which is the default. Hence:
>>
>> ##########################################
>>> success <- c(13,12,11,14,14,11,13,11,12)
>>> failure <- c(0,0,0,0,0,0,0,2,2)
>>> predictor <- c(0,80*5^(0:7))
>>> glm(cbind(success,failure) ~ predictor,
>> + family = binomial, #(link="log"),
>> + control = glm.control(epsilon=1e-8,trace=TRUE,maxit=50))
>> Deviance = 7.621991 Iterations - 1
>> Deviance = 6.970934 Iterations - 2
>> Deviance = 6.941054 Iterations - 3
>> Deviance = 6.940945 Iterations - 4
>> Deviance = 6.940945 Iterations - 5
>>
>> Call: glm(formula = cbind(success, failure) ~ predictor, family =
>> binomial, control = glm.control(epsilon = 1e-08, trace = TRUE,
>> maxit = 50))
>>
>> Coefficients:
>> (Intercept) predictor
>> 4.180e+00 -4.106e-07
>>
>> Degrees of Freedom: 8 Total (i.e. Null); 7 Residual
>> Null Deviance: 12.08
>> Residual Deviance: 6.941 AIC: 15.85

>> -----Original Message-----
>> From: r-help-bounces@stat.math.ethz.ch
>> [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Roland R Regoes
>> Sent: Sunday, 15 January 2006 11:04 PM
>> To: r-help@stat.math.ethz.ch
>> Subject: [R] problems with glm
>>
>>
>> Dear R users,
>>
>> I am having some problems with glm. The first is an error message
>> "subscript out of bounds". The second is the fact that reasonable
>> starting values are not accepted by the function.
>>
>> To be more specific, here is an example:
>>
>>> success <- c(13,12,11,14,14,11,13,11,12)
>>> failure <- c(0,0,0,0,0,0,0,2,2)
>>> predictor <- c(0,80*5^(0:7))
>>> glm(cbind(success,failure) ~ predictor + 0,
>> + family=binomial(link="log"),
>> + control=glm.control(epsilon=1e-8,trace=TRUE,maxit=50))
>> Deviance = 3.348039 Iterations - 1
>> Error: subscript out of bounds
>> In addition: Warning message:
>> step size truncated: out of bounds
>>>
>>
>> The model with intercept yields:
>>
>>> glm(cbind(success,failure) ~ predictor ,
>> + family=binomial(link="log"),
>> + control=glm.control(epsilon=1e-8,trace=FALSE,maxit=50))
>>
>> Call: glm(formula = cbind(success, failure) ~ predictor, family =
>> binomial(link = "log"), control = glm.control(epsilon = 1e-08,
>> trace = FALSE, maxit = 50))
>>
>> Coefficients:
>> (Intercept) predictor
>> -5.830e-17 -4.000e-08
>>
>> Degrees of Freedom: 8 Total (i.e. Null); 7 Residual
>> Null Deviance: 12.08
>> Residual Deviance: 2.889 AIC: 11.8
>> There were 33 warnings (use warnings() to see them)
>>> warnings()
>> 1: step size truncated: out of bounds
>> ...
>> 31: step size truncated: out of bounds
>> 32: algorithm stopped at boundary value in: glm.fit(x = X, y = Y,
>> weights = weights, start = start, etastart = etastart, ...
>> 33: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X,
>> y = Y, weights = weights, start = start, etastart = etastart, ...
>>>
>>
>> Since the intercept in the above fit is fairly small I thought I
>> could use -4e-8 as a reasonable starting value in a model without
>> intercept. But to no avail:
>>
>>> glm(cbind(success,failure) ~ predictor + 0, start=-4e-8,
>> + family=binomial(link="log"),
>> + control=glm.control(epsilon=1e-8,trace=FALSE,maxit=50))
>> Error in glm.fit(x = X, y = Y, weights = weights, start = start,
>> etastart = etastart, :
>> cannot find valid starting values: please specify some
>>>
>>
>> I am stuck here. Am I doing something wrong when specifying the
>> starting value? I would appreciate any help. (I could not find anything
>> relevant in the documentation of glm and the mailing list archives, but
>> I did not read the source code of glm yet.)
>>
>> Roland
>>
>>
>> PS: I am using R Version 2.2.0 (R Cocoa GUI 1.13 (1915)) on MacOSX
>> 10.4.4
>>
>> --
>> -----------------------------------------------------------------------
>> Roland Regoes
>> Theoretical Biology
>> Universitaetsstr. 16
>> ETH Zentrum, CHN H76.1
>> CH-8092 Zurich, Switzerland
>>
>> tel: +41-44-632-6935
>> fax: +41-44-632-1271
>>
>> ______________________________________________
>> R-help@stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide!
>> http://www.R-project.org/posting-guide.html
>
> --
> -----------------------------------------------------------------------
> Roland Regoes
> Theoretical Biology
> Universitaetsstr. 16
> ETH Zentrum, CHN H76.1
> CH-8092 Zurich, Switzerland
>
> tel: +41-44-632-6935
> fax: +41-44-632-1271
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Mon Jan 16 02:04:23 2006

This archive was generated by hypermail 2.1.8 : Mon 16 Jan 2006 - 06:08:35 EST