Re: [R] GAM selection error msgs (mgcv & gam packages)

From: Simon Wood <sw283_at_maths.bath.ac.uk>
Date: Thu 22 Jun 2006 - 02:09:43 EST

>
> My question concerns 2 error messages; one in the gam package and one in
> the mgcv package (see below). I have read help files and Chambers and
> Hastie book but am failing to understand how I can solve this problem.
> Could you please tell me what I must adjust so that the command does not
> generate error message?
>
> I am trying to achieve model selection for a GAM which is required for
> prediction purposes, thus my focus is on AIC. My data set has 3038 records
> and 116 predictor variables and a binary response variable [0 or 1]. There
> is no current understanding of the predictors' relationship to response so
> I am relying on GAM for selection of appropriate predictors.

best,
Simon

>- Simon Wood, Mathematical Sciences, University of Bath, Bath BA2 7AY
>- +44 (0)1225 386603 www.maths.bath.ac.uk/~sw283/

>
> Thanks
> Savrina
>
> *mgcv package 1.3-12:
>
> # I start with specifying the full model with 116 predictors including
> isotropic smooth of 3D location variables (when I specify only the first
> 14 predictors I get no error message)
>>
> m0<-gam(label~s(x,y,z,k=50),s+(feature4)+s(feature5)+s(feature6)+...+s(feature116),data=k.data,
> family=binomial)
>
> Error in smooth.construct.tp.smooth.spec(object, data, knots):
> A term has fewer unique covariate combinations than specified maximum
> degrees of freedom
>
> # I was going to follow this with backwards selection by hypothesis testing
> (remove highest p-val term one at a time) and also AIC comparison of all
> the models
>
>> From help file entitled 'Generalised additive models with integrated
> smoothness estimation' I calculated the following where do I go from here?
> A) "k is the basis dimension of a given term...if k is not specified
> k=10*3^(d-1) where 'd' is the number of covariates for this term"
> My calculations: for all my terms but the first d=1 thus k=10*3^0=10.
> B) "You must have more unique combinations of covariates than the model has
> total parameters"
> My calculations: total parameters = sum of basis dimensions(50+10*113) +
> sum of non-spline terms(0) - number of spline terms(114) = 1066
>
> *gam package:
> I think stepwise selection provided by gam package would be useful in
> finding the best predictive model. I follow example on pg 283 from
> 'Statistical models in S' Chambers and Hastie 1993.
> # I start with a full model where all predictors enter linearly
>> k.start<-gam(label~., data=k.data, family=binomial)
>
> # set up scope list with possibilities for each term eg .~1 + x + s(x)
> # ignore the first column of the data set
>> k.scope<-gam.scope(k.data[,-1])
>
> # start step wise selection
>> k.step<-step(k.start,k.scope)
> #condensed output
> Start: AIC=1549.48
> label~s+y+z+feature4+feature5+...+feature116
> Df Deviance AIC
> <none> 1319.5 1549.5
> - feature54 -1 1319.2 1551.2
> - feature26 -1 1319.2 1551.2
> ...
> -feature12 -1 1357.4 1589.4
> There were 50 or more warnings (use warnings() to see the first 50)
>
> # all 50 warnings are the same
>> warnings()
> Warning messages:
> 1: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x[, jj,
> drop = FALSE], y, wt, offset = object$offset, ...
>
> # it seems to not get passed the orginal linear model. It should show all
> the steps taken to the final model
>> k.step$anova
> Step Df Deviance Resid. Df Resid. Dev AIC
> 1 NA NA 2922 1317.599 1549.599
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Jun 22 02:16:04 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 22 Jun 2006 - 04:12:10 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.