Re: [R] contingency table analysis; generalized linear model

From: Trevor Hastie <hastie_at_stanford.edu>
Date: Wed 10 Jan 2007 - 15:06:55 GMT

> Date: Tue, 9 Jan 2007 11:13:41 +0000 (GMT)
> From: Mark Difford <mark_difford@yahoo.co.uk>
> Subject: Re: [R] contingency table analysis; generalized linear model
>
> Dear List,
>
> I would appreciate help on the following matter:
>
> I am aware that higher dimensional contingency tables can be
> analysed using either log-linear models or as a poisson regression
> using a generalized linear model:
>
> log-linear:
> loglm(~Age+Site, data=xtabs(~Age+Site, data=SSites.Rev,
> drop.unused.levels=T))
>
> GLM:
> glm.table <- as.data.frame(xtabs(~Age+Site, data=SSites.Rev,
> drop.unused.levels=T))
> glm(Freq ~ Age + Site, data=glm.table, family='poisson')
>
> where Site is a factor and Age is cast as a factor by xtabs() and
> treated as such.
>
> **Question**:
> Is it acceptable to step away from contingency table analysis by
> recasting Age as a numerical variable, and redoing the analysis as:
>
> glm(Freq ~ as.numeric(Age) + Site, data=glm.table, family='poisson')
>
> My reasons for wanting to do this are to be able to include non-
> linear terms in the model, using say restricted or natural cubic
> splines.
>
> Thank you in advance for your help.
> Regards,
> Mark Difford.
>
>
> ---------------------------------------------------------------
> Mark Difford
> Ph.D. candidate, Botany Department,
> Nelson Mandela Metropolitan University,
> Port Elizabeth, SA.

Yes it is, and it is often the preferred way to view the analysis. In this case it looks like Freq is measuring something like species abundance,
and it is natural to model this as a Poisson count via a log-link glm. As such you are free to include any reasonable functions of your predictors
in modeling the mean.

Log-linear models are typically presented as ways of analyzing dependence between
categorical variables, when represented as multi-way tables. The appropriate multinomial
models, conditioning on certain marginals, happen to be equivalent to Poisson glms with
appropriate terms included.

I would suggest in your data preparation that you glm.table[,"Age"] <- as.numeric(glm.table[,"Age"]) at the start, so that now you can think of your data in the right way.

Trevor Hastie


   Trevor Hastie                                   hastie@stanford.edu
   Professor & Chair, Department of Statistics, Stanford University
   Phone: (650) 725-2231 (Statistics)          Fax: (650) 725-8977
   (650) 498-5233 (Biostatistics) Fax: (650) 725-6951    URL: http://www-stat.stanford.edu/~hastie     address: room 104, Department of Statistics, Sequoia Hall

            390 Serra Mall, Stanford University, CA 94305-4065


        [[alternative HTML version deleted]]



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu Jan 11 02:18:37 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Wed 10 Jan 2007 - 16:30:32 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.