Re: [R] Anova function and glm.nb

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Mon, 07 Apr 2008 22:21:55 +0100 (BST)

The re-estimated version is the principled one -- the other is a computational shortcut.

However, what you model might choose depends on why you are doing model selection, and also if there is a subject-specific reason to consider the terms in this order.

And if you look in MASS (the book) you will see a similar example and discussion of how to treat it. It uses dropterm(), which is likely to be preferable to either of these approaches.

On Mon, 7 Apr 2008, Nelson, Gary (FWE) wrote:

> Hi All,
>
> I am using the glm.nb function from the MASS package (current version)
> to fit and compare GLMs with negative binomial error distributions. My
> question is: what is the appropriate method to use in the anova function
> to compare models? If only one fitted object like
>
> m<-glm.nb(number<-p+sal+temp,data=data)
>
> is specified in the anova function (anova(m)), a fixed theta is used to
> generate the analysis of deviance:
>
> Analysis of Deviance Table
> Model: Negative Binomial(0.2345), link: log
> Response: number
> Terms added sequentially (first to last)
>
> Df Deviance Resid. Df Resid. Dev P(>|Chi|)
> NULL 117 122.707
> p 1 11.327 116 111.380 0.001
> sal 1 2.286 115 109.094 0.131
> tem 1 1.979 114 107.115 0.159
> ph 1 2.567 113 104.549 0.109
> Warning message:
> In anova.negbin(m) : tests made without re-estimating 'theta'
>
>
> If multiple fitted objects like
>
> m1<-glm.nb(number~1,data=data)
> m2<-glm.nb(number~p,data=data)
> m3<-glm.nb(number~p+sal,data=data)
> m4<-glm.nb(number~p+sal+temp,data=data)
>
>
> is specified (anova(m1,m2,m3,4)), the theta is assumed re-estimated in
> each case to calculate the likelihood ratio tests:
>
> Likelihood ratio tests of Negative Binomial Models
> Response: number
> Model theta Resid. df 2 x log-lik. Test df LR
> stat. Pr(Chi)
> 1 1 0.1892056 117 -527.7463
>
> 2 p 0.2153105 116 -517.9349 1 vs 2 1
> 9.811382 0.001734351
> 3 p + sal 0.2214626 115 -515.7942 2 vs 3 1
> 2.140706 0.143435894
> 4 p + sal + tem 0.2261900 114 -513.8846 3 vs 4 1
> 1.909643 0.167002884
> 5 p + sal + tem + ph 0.2344827 113 -511.3633 4 vs 5 1
> 2.521237 0.112322429
>
>
> The conclusions are the same, but I'd like to know if one method is
> favored over the other.
>
> Thanks,
>
> Gary Nelson.
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Mon 07 Apr 2008 - 21:25:56 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 07 Apr 2008 - 21:30:28 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive