# RE: [R] Anova - adjusted or sequential sums of squares?

From: Thomas Lumley <tlumley_at_u.washington.edu>
Date: Thu 21 Apr 2005 - 01:39:31 EST

On Wed, 20 Apr 2005, michael watson (IAH-C) wrote:

> I guess the real problem is this:
>
> As I have a different number of observations in each of the groups, the
> results *change* depending on which order I specify the factors in the
> model. This unnerves me. With a completely balanced design, this
> doesn't happen - the results are the same no matter which order I
> specify the factors.
>
> It's this reason that I have been given for using the so-called type III
>

This is one of many examples of an attempt to provide a mathematical answer to something that isn't a mathematical question.

As people have already pointed out, in any practical testing situation you have two models you want to compare. If you are working in an interactive statistical environment, or even in a modern batch-mode system, you can fit the two models and compare them. If you want to compare two other models, you can fit them and compare them.

However, in the Bad Old Days this was inconvenient (or so I'm told). If you had half a dozen tests, and one of the models was the same in each test, it was a substantial saving of time and effort to fit this model just once.

This led to a system where you specify a model and a set of tests: eg I'm going to fit y~a+b+c+d and I want to test (some of) y~a vs y~a+b, y~a+b vs y~a+b+c and so on. Or, I want to test (some of) y~a+b+c vs y~a+b+c+d, y~a+b+d vs y~a+b+c+d and so on. This gives the "Types" of sums of squares, which are ways of specifying sets of tests. You could pick the "Type" so that the total number of linear models you had to fit was minimized. As these are merely a computational optimization, they don't have to make any real sense. Unfortunately, as with many optimizations, they have gained a life of their own.

The "Type III" sums of squares are the same regardless of order, but this is a bad property, not a good one. The question you are asking when you test "for" a term X really does depend on what other terms are in the model, so order really does matter. However, since you can do anything just by specifying two models and comparing them, you don't actually need to worry about any of this.

-thomas

R-help@stat.math.ethz.ch mailing list