Re: [R] glm cannot find valid starting values

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Fri 21 Jul 2006 - 22:10:14 EST

On Fri, 21 Jul 2006, Dan Bebber wrote:

> glm(S ~ -1 + Mdif, family=quasipoisson(link=identity), start=strt, sdat)
> gives error:
>
> > Error in glm.fit(x = X, y = Y, weights = weights, start = start, etastart
> > =
> > etastart, :
> > cannot find valid starting values: please specify some
>
> strt is set to be the coefficient for a similar fit
> glm(S ~ -1 + I(Mdif + 1),...
> i.e. (Mdif + 1) is a vector similar to Mdif.
> The error appears to occur when some values of Mdif are negative,
> though I have not had this problem with simulated datasets.

Right: if Mdif contains both positive and negative values there are no coefficients that are valid for that model (you are bound to predict negative means).

You often do better to take etastart from another fit than start, but that will not help here, I believe.

BTW, your example cannot be pasted in as 'sdat' self-references. It could be fixed, but I gave up at that point.

>
> Any solutions greatly appreciated (full details and data below).
>
> Dan Bebber
> Department of Plant Sciences
> University of Oxford
>
> OS: WinXP Pro SP2 and Win ME (tried both)
> Processor: Dual Intel Xeon and Pentium 4 (tried both)
> R version: 2.3.0 and 2.3.1 (tried both)
>
> #Full details (can be pasted directly into R):
> #Data:
> sdat <- data.frame(
> S = c(0, 0, 0, 0, 28, 0, 1, 7, 0, 0, 39, 2, 0, 0, 40, 0, 0, 0, 0,
> 0, 0, 15, 0, 0, 3, 0, 0, 0, 2, 0, 3, 0, 30, 0, 20, 0, 1, 0, 0,
> 1, 21, 0, 0, 4, 14, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 3, 0,
> 2, 5, 0, 0, 0, 0, 0, 0, 0, 25, 0, 5, 0, 0, 0, 1, 0, 1, 0, 0,
> 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0,
> 0, 0, 0, 0, 0, 1, 10, 0, 0, 0, 0, 0, 9, 1, 1, 1, 1, 0, 0, 3,
> 0, 27, 7, 0, 0, 0, 0, 0, 1, 1, 4, 2, 1, 2, 4, 1, 4, 6, 12, 4,
> 6, 3, 4, 0, 4, 0, 6, 1, 3, 1, 4, 4, 1, 1, 2, 1, 6, 1, 0, 0, 1,
> 0, 1, 0, 0, 6, 0, 0, 0, 0, 2, 2, 3, 1, 6, 2, 2, 1, 1, 4, 4, 3,
> 3, 7, 1, 3, 5, 6, 4, 0, 1, 4, 2, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0,
> 1, 0, 4, 0, 0, 1, 0, 2, 0, 0, 1, 2, 0, 4, 0, 2, 0, 3, 0, 2, 2,
> 0, 4, 0, 2, 0, 1, 2, 3, 0, 0, 0, 0, 3, 1, 0, 0, 0, 3, 2, 1, 2,
> 1, 1, 2, 0, 0, 3, 3, 1, 0, 1, 1, 2, 0, 1, 0, 1, 0, 1, 3, 2, 0,
> 0, 2, 1, 0, 1, 0, 0, 0, 0, 0, 0, 4, 2, 4, 0, 2, 0, 0, 0, 0, 0,
> 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 3, 2, 0, 0, 0, 0, 0,
> 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 4, 1, 0, 0, 1),
> M = 620+c(0,cumsum(sdat$S[-328])))
> #S is the (unknown) number of N individuals that irreversibly change state
> in a time
> #interval t. The data provided are a subset of the full data set.
> #M is the cumulative sum of individuals that have changed state up to t-1.
> #Assume that the rate of state change is constant (S ~ kM), but the
> #distribution of S is clustered.
> #The goal is to estimate N.
> #N can be estimated by fitting:
> qpglm <- glm(S ~ M, family = quasipoisson(link = identity), sdat)
> summary(qpglm)
> N.est <- -coef(qpglm)[1]/coef(qpglm)[2]
> N.est
> #i.e. x-intercept is minus intercept/slope
> #To estimate confidence limits on N.est, fit models without intercept to
> #N.est - M + x, where x is an integer. The model will have the lowest
> deviance
> #when x = 0.

> x <- 0
> Mdif <- N.est - M + x
> qpglm2 <- glm(S ~ -1 + Mdif, family = quasipoisson(link = identity), sdat)
> summary(qpglm2)
> #Use analysis of deviance to estimate confidence limits on N.est:

> disp <- summary(qpglm)$dispersion
> dfres <- df.residual(qpglm)
> dev.res <- deviance(qpglm)
> #From MASS4, p. 210, assume that changes in deviance scaled by
> #dispersion as |x| increases have an F distribution
> dev.crit <- dev.res+qf(0.95,1,dfres)*disp
> dev.crit
> #values of x for which the deviance = dev.crit give approximate 95%

> confidence limits
> #on N.est.
> #The error occurs when x <= -91.7:

> x <- -91.7
> sdat$Mdif <- N.est - sdat$M + x
> strt <- coef(glm(S ~ -1 + I(Mdif+1), family = quasipoisson(link = identity),
> sdat))
> qpglm2 <- glm(S ~ -1 + Mdif, family = quasipoisson(link = identity),
> start=strt, sdat)
> #The problem is that this interferes with optimization to find values of x
> for which
> #deviance = dev.crit
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Fri Jul 21 22:16:53 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Sat 22 Jul 2006 - 00:15:52 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.