Re: [R] GAM, GLM, Logit, infinite or missing values in 'x'

From: Simon Wood <s.wood_at_bath.ac.uk>
Date: Tue, 5 Feb 2008 10:57:05 +0000

Anders,

Thanks for sending the data. The fix is to reduce the convergence tolerance `epsilon' in the `control' argument to `gam' (1e-8 is fine). I'll put a trap and informative error message into a future mgcv release.

Here's what happens. The model is quite ill conditioned (there's near collinearity in the parametric terms) and there is quite slow convergence of the IRLS scheme used for fitting. With the default convergence tolerances the parameter values are insufficiently converged to be fed into the iteration that finds the derivatives of the AIC/UBRE score with respect to the smoothing parameters. The derivative iterations diverge (ill conditioning introduces sensitivity to the slight error in the parameter estimates from the IRLS, and it doesn't help that the linear predictor is "practically infinite" in places). Tightening the convergence tolerances in the IRLS improves the parameter estimates sufficiently for the derivative iterations to converge.

best,
Simon

On Tuesday 08 January 2008 06:35, Anders Schwartz Corr wrote:
> Hi,
>
> I'm running gam (mgcv version 1.3-29) and glm (logit) (stats R 2.61) on
> the same models/data, and I got error messages for the gam() model and
> warnings for the glm() model.
>
> R-help suggested that the glm() warning messages are due to the model
> perfectly predicting binary output. Perhaps the model overfits the data? I
> inspected my data and it was not immediately obvious to me (though I guess
> it will be to some of the more pointed of you) how this would be the case.
>
> The gam() errors vanish when I delete one covariate (it doesn't matter
> which one). Can I write a loop into the code such that if an error is
> returned (is.error() doesn't seem to exist unfortunately) then I pare off
> one of the covariates and rerun the gam()? That would be ideal. I could
> set options(error = f()) in which f() reruns the gam with
> one fewer covariate until it works, but the gam is in a bunch of loops
> that would break given the error and I would like to figure out another
> option.
>
> My glm and gam models are below. Any suggestions are very much
> appreciated.
>
> Best,
>
> Anders
>
> > form.logit
>
> outbinary ~ a_norm_total2 + I(a_norm_total2^2) + prop + igoprop +
> gpconc + ter + open + igototal + cinc.nmc + demsOnumstat +
> diversity + cincOter + polity2
>
> > form.glogit
>
> outbinary ~ s(a_norm_total2) + s(prop) + s(prop, by = a_norm_total2) +
> igoprop + gpconc + ter + open + igototal + cinc.nmc + demsOnumstat +
> diversity + cincOter + polity2
>
> GAM error message:
> avt.2glogit<-gam(form.glogit, data=dataS,
> na.action=na.omit,family=binomial) Error in eigen(hess1, symmetric = TRUE)
> :
> infinite or missing values in 'x'
> Calls: gam -> gam.outer -> newton -> eigen
>
> GLM warnings:
> There were 29 warnings (use warnings() to see them)
>
> > warnings()
>
> Warning messages:
> 1: In glm.fit(x = X, y = Y, weights = weights, start = start, ... :
> fitted probabilities numerically 0 or 1 occurred
> 2: In glm.fit(x = X, y = Y, weights = weights, start = start, ... :
> fitted probabilities numerically 0 or 1 occurred
> 3: In glm.fit(x = X, y = Y, weights = weights, start = start, ... :
> fitted probabilities numerically 0 or 1 occurred
> 4: In glm.fit(x = X, y = Y, weights = weights, start = start, ... :
> fitted probabilities numerically 0 or 1 occurred
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented, minimal,
> self-contained, reproducible code.

-- 

> Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> +44 1225 386603 www.maths.bath.ac.uk/~sw283
______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Received on Tue 05 Feb 2008 - 11:15:43 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 05 Feb 2008 - 12:30:12 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive