Re: [R] Two repeated warnings when running gam(mgcv) to analyse my dataset?

From: Simon Wood <s.wood_at_bath.ac.uk>
Date: Tue, 18 Dec 2007 10:54:50 +0000

The model here is just a penalised GLM, and the warnings relate to the GLM fitting process. Fitted probabilities of 0 or 1 can be perfectly appropriate, but do indicate that the linear predictor is not really uniquely defined, and that some care may be needed in interpreting results (for example, if the fitted probabilities are zero or one, then a CI for the corresponding linear predictor will depend more on the prior assumptions about smoothness than anything else). This problem is not really GAM specific, it relates to any `logistic regression' model.

Similarly, the GLM fitting IRLS iterations are not guaranteed to converge, and can fail, especially for overly flexible logistic regression models. Try this, for example....

x <- 1:10
y <- c(0,0,0,0,0,1,1,1,1,1)
glm(y~x,family=binomial)

I get...
...
Warning messages:
1: In glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, :
  algorithm did not converge
2: In glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, :
  fitted probabilities numerically 0 or 1 occurred

...as models become more complex the scope for this sort of thing to happen increases, and some simplification may be appropriate.

That said, mgcv::gam fitting with all smoothing parameters fixed, is slightly more likely to fail in this way than `glm' or `mgcv::gam' with some smoothing parameters estimated, because of the steps taken to stabilise divergent fit iterations. When all smoothing parameters are fixed, mgcv uses older fitting routines that don't try as hard to stabilise a divergent fit as the newer fitting routines. This is a bit of an anomaly and I'll try and fix it for a future release.

best,
Simon

On Monday 17 December 2007 11:53, zhijie zhang wrote:
> Dear Simon,
> Sorry for an incomplete listing of the question.
> #mgcv version is 1.3-29, R 2.6.1, windows XP
> #m.gam<-gam(mark~s(x)+s(y)+s(lstday2004)+s(ndvi2004)+s(slope)+s(elevation)+
>disbinary,family=binomial(logit),data=point) The above program's the core
> codes in my following loop programs.
> It works well if i run the above codes only one time for my dataset, but
> warnings will occur if i run many times for the following loop.
>
> > while (j<1001) {
>
> + index=sample(ID, replace=F)
> + m.data$x=coords[index,]$x
> + m.data$y=coords[index,]$y
> + # For each permutation, we run the GAM using the optimal span for the
> above model m.gam
> + s.gam
> <-gam(mark~s(x)+s(y)+s(lstday2004)+s(ndvi2004)+s(slope)+s(elevation)+disbin
>ary,,sp=c( 5.582647e-07,4.016504e-02,2.300424e-04,1.274065e+03,9.558236e-09,
> 1.868827e-08),family=binomial(logit),data=m.data)
> + permresults[,i]=predict.gam(s.gam)
> + i=i+1
> + if (j%%100==0) print(i)
> + j=j+1
> + }
> [1] 101
> [1] 201
> [1] 301
> [1] 401
> [1] 501
> [1] 601
> [1] 701
> [1] 801
> [1] 901
> [1] 1001
> warnings() over 50
>
> > warnings()
>
> 1: In gam.fit(G, family = G$family, control = control, gamma = gamma, ...
> : fitted probabilities numerically 0 or 1 occurred
> ......................................
> 14: In gam.fit(G, family = G$family, control = control, gamma = gamma, ...
>
> Algorithm did not converge
> ..........................
>
> On Dec 17, 2007 4:54 PM, Simon Wood <s.wood_at_bath.ac.uk> wrote:
> > What mgcv version are you running (and on what platform)?
> >
> > n Thursday 13 December 2007 17:46, zhijie zhang wrote:
> > > Dear all,
> > > I run the GAMs (generalized additive models) in gam(mgcv) using the
> > > following codes.
> > >
> > > m.gam
> >
> > <-gam(mark~s(x)+s(y)+s(lstday2004)+s(ndvi2004)+s(slope)+s(elevation)+disb
> >in
> >
> > >ary,family=binomial(logit),data=point)
> > >
> > > And two repeated warnings appeared.
> > > Warnings´╝Ü
> > > 1: In gam.fit(G, family = G$family, control = control, gamma = gamma,
> >
> > ...
> >
> > > : Algorithm did not converge
> > >
> > > 2: In gam.fit(G, family = G$family, control = control, gamma = gamma,
> >
> > ...
> >
> > > : fitted probabilities numerically 0 or 1 occurred
> > >
> > > Q1: For warning1, could it be solved by changing the value of
> > > mgcv.toloptions for
> > > gam.control(mgcv.tol=1e-7)?
> > >
> > > Q1: For warning2, is there any impact for the results if the "fitted
> > > probabilities numerically 0 or 1 occurred" ? How can i solve it?
> > >
> > > I didn't try the possible solutions for them, because it took such a
> > > longer time to run the whole programs.
> > > Could anybody suggest their solutions?
> > > Any help or suggestions are greatly appreciated.
> > > Thanks.
> >
> > --
> >
> > > Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> > > +44 1225 386603 www.maths.bath.ac.uk/~sw283

-- 

> Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> +44 1225 386603 www.maths.bath.ac.uk/~sw283
______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Received on Tue 18 Dec 2007 - 11:21:40 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 18 Dec 2007 - 12:30:20 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.