Re: [R] glm model syntax

From: Berwin A Turlach <berwin_at_maths.uwa.edu.au>
Date: Sat, 17 May 2008 00:27:15 +0800

G'day Harold,

On Fri, 16 May 2008 11:43:32 -0400
"Doran, Harold" <HDoran_at_air.org> wrote:

> N+M gives only the main effects, N:M gives only the interaction, and
> G*M gives the main effects and the interaction.

I guess this begs the question what you mean with "N:M gives only the interaction" ;-)

Consider:

R> (M <- gl(2, 1, length=12))
 [1] 1 2 1 2 1 2 1 2 1 2 1 2
Levels: 1 2
R> (N <- gl(2, 6))
 [1] 1 1 1 1 1 1 2 2 2 2 2 2
Levels: 1 2
R> dat <- data.frame(y= rnorm(12), N=N, M=M) R> dim(model.matrix(y~N+M, dat))
[1] 12 3
R> dim(model.matrix(y~N:M, dat))
[1] 12 5
R> dim(model.matrix(y~N*M, dat))
[1] 12 4

Why has the model matrix of y~N:M more columns than the model matrix of y~N*M if the former contains the interactions only and the latter contains main terms and interactions? Of course, if we leave the dim() command away, we will see why. Moreover, it seems that the model matrix constructed from y~N:M has a redundant column.

Furthermore:

R> D1 <- model.matrix(y~N*M, dat)
R> D2 <- model.matrix(y~N:M, dat)
R> resid(lm(D1~D2-1))

Shows that the column space created by the model matrix of y~N*M is completely contained within the column space created by the model matrix of y~N:M, and it is easy to check that the reverse is also true. So it seems to me that y~N:M and y~N*M actually fit the same models. To see how to construct one design matrix from the other, try:

R> lm(D1~D2-1)

Thus, I guess the answer is that y~N+M fits a model with main terms only while y~N:M and y~N*M fit the same model, namely a model with main and interaction terms, these two formulations just create different design matrices which has to be taken into account if one tries to interpret the estimates.

Of course, all the above assumes that N and M are actually factors, something that Birgit did not specify. If N and M (or only one of them) is a numeric vector, then the constructed matrices might be different, but this is left as an exercise. ;-) (Apparently, if N and M are both numeric, then your summary is pretty much correct.)

Cheers,

        Berwin


R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 16 May 2008 - 17:20:34 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 16 May 2008 - 18:30:37 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive