Re: [R] Anova - adjusted or sequential sums of squares?

From: Douglas Bates <bates_at_stat.wisc.edu>
Date: Thu 21 Apr 2005 - 00:06:32 EST

michael watson (IAH-C) wrote:
> Hi
>
> I am performing an analysis of variance with two factors, each with two
> levels. I have differing numbers of observations in each of the four
> combinations, but all four combinations *are* present (2 of the factor
> combinations have 3 observations, 1 has 4 and 1 has 5)
>
> I have used both anova(aov(...)) and anova(lm(...)) in R and it gave the
> same result - as expected. I then plugged this into minitab, performed
> what minitab called a General Linear Model (I have to use this in
> minitab as I have an unbalanced data set) and got a different result.
> After a little mining this is because minitab, by default, uses the type
> III adjusted SS. Sure enough, if I changed minitab to use the type I
> sequential SS, I get exactly the same results as aov() and lm() in R.
>
> So which should I use? Type I adjusted SS or Type III sequential SS?
> Minitab help tells me that I would "usually" want to use type III
> adjusted SS, as type I sequential "sums of squares can differ when your
> design is unbalanced" - which mine is. The R functions I am using are
> clearly using the type I sequential SS.

Install the fortunes package and try
 > fortune("Venables")

I'm really curious to know why the "two types" of sum of squares are called "Type I" and "Type III"! This is a very common misconception, particularly among SAS users who have been fed this nonsense quite often for all their professional lives. Fortunately the reality is much simpler. There is, by any
sensible reckoning, only ONE type of sum of squares, and it always represents
an improvement sum of squares of the outer (or alternative) model over the inner (or null hypothesis) model. What the SAS highly dubious classification of
sums of squares does is to encourage users to concentrate on the null hypothesis model and to forget about the alternative. This is always a very bad
idea and not surprisingly it can lead to nonsensical tests, as in the test it
provides for main effects "even in the presence of interactions", something which beggars definition, let alone belief.

In the words of the master, "there is ... only one type of sum of squares", which is the one that R reports. The others are awkward fictions created for times when one could only afford to fit one or two linear models per week and therefore wanted the output to give results for all possible tests one could conceive, even if the models being tested didn't make sense.



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Apr 21 00:15:39 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:31:17 EST