Re: [R] trouble with step() and stepAIC() selecting the best model

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Fri 05 May 2006 - 20:02:06 EST

step() and stepAIC() are inappropriate for glm.nb.

On Fri, 5 May 2006, natalia norden wrote:

> Hello,
> I have some trouble using step() and stepAIC() functions.
> I'm predicting recruitment against several factors for different plant
> species using a negative binomial glm.

You are not using a negative binomial glm. A negative binomial fit is only a glm for fixed theta, and you are estimating theta. There is no reason to expect step() to support a function in the MASS package (although it appears that it does work under some circumstances).

glm.nb is support software for a book: please consult it. There is nothing reproducible here (not even the commands used to get the output you show), nor is the anova component shown. But for example

example(glm.nb)
stepAIC(quine.nb3)

works (as does step).

> Sometimes, summary(step(model)) or summary(stepAIC(model) does not
> select the best model (lowest AIC) but just stops before.
> For some species, step() works and stepAIC don't and in others, it's the
> opposite.
>
> The model is
>
> model=glm.nb(sdg.cens1 ~ seed.cens1 + log(in.lit+1) +
> log(trans.dir+1)+ log(trans.dif+1) + slope +log(pH+1) + log(CN+1) + site,
> data=data)
>
>
> For example, for species A using stepAIC() I get:
>
> Call:
> glm.nb(formula = tot.rec ~ slope + log(pH + 1) + log(CN + 1),
> data = data, init.theta = 0.0575099635396228, link = log)
>
> Deviance Residuals:
> Min 1Q Median 3Q Max
> -7.591e-01 -3.688e-01 -1.828e-01 -8.494e-08 1.520e+00
>
> Coefficients:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) 110.729 47.643 2.324 0.02012 *
> slope2 -34.202 972079.452 -3.52e-05 0.99997
> slope3 -1.423 1.371 -1.038 0.29928
> log(pH + 1) -51.244 19.544 -2.622 0.00874 **
> log(CN + 1) -9.132 6.906 -1.322 0.18602
> ---
> Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>
> (Dispersion parameter for Negative Binomial(0.0575) family taken to be 1)
>
> Null deviance: 34.068 on 156 degrees of freedom
> Residual deviance: 19.602 on 152 degrees of freedom
> AIC: 90.357
>
>
> But using step() I get:
>
> Step: AIC= 138.4
> tot.rec ~ log(mean.lit.rec + 1) + log(trans.dir + 1) + log(trans.dif +
> 1) + slope + log(pH + 1) + log(CN + 1) + site
>
>
> Call:
> glm.nb(formula = tot.rec ~ log(mean.lit.rec + 1) + log(trans.dir +
> 1) + log(trans.dif + 1) + slope + log(pH + 1) + log(CN +
> 1) + site, data = data, init.theta = 74070.0501196965, link = log)
>
> Deviance Residuals:
> Min 1Q Median 3Q Max
> -2.3110 -0.5450 -0.3179 0.0000 3.8755
>
> Coefficients:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) 6.809e+01 1.905e+01 3.574 0.000351 ***
> log(mean.lit.rec + 1) -1.238e+00 1.083e+00 -1.143 0.252987
> log(trans.dir + 1) 2.069e-01 1.227e+00 0.169 0.866120
> log(trans.dif + 1) -6.978e-01 1.578e+00 -0.442 0.658327
> slope2 -3.730e+01 1.022e+07 -3.65e-06 0.999997
> slope3 -9.195e-01 6.269e-01 -1.467 0.142488
> log(pH + 1) -3.152e+01 6.918e+00 -4.556 5.22e-06 ***
> log(CN + 1) -5.053e+00 3.404e+00 -1.485 0.137643
> site1 1.379e-01 4.618e-01 0.299 0.765278
> ---
> Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>
> (Dispersion parameter for Negative Binomial(91071.54) family taken to be 1)
>
> Null deviance: 145.586 on 156 degrees of freedom
> Residual deviance: 99.166 on 148 degrees of freedom
> AIC: 140.40
>
>
>
> However, for species B I get the opposite, it works with step() but not
> with stepAIC().
>
> Using step() AIC =414.28 and using stepAIC(), AIC=424.15 and the results
> are totally different.
>
> I would appreciate any help!!
> Thank you
>
> Natalia Norden
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Fri May 05 20:11:06 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Fri 05 May 2006 - 22:10:00 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.