Re: [R] Type II and III sum of square in Anova (R, car package)

From: John Fox <jfox_at_mcmaster.ca>
Date: Tue 29 Aug 2006 - 08:07:37 EST


Dear Amasco,

Again, I'll answer briefly (since the written source that I previously mentioned has an extensive discussion):

> -----Original Message-----
> From: r-help-bounces@stat.math.ethz.ch
> [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Amasco
> Miralisus
> Sent: Monday, August 28, 2006 2:21 PM
> To: r-help@stat.math.ethz.ch
> Cc: John Fox; Prof Brian Ripley; Mark Lyman
> Subject: Re: [R] Type II and III sum of square in Anova (R,
> car package)
>
> Hello,
>
> First of all, I would like to thank everybody who answered my
> question. Every post has added something to my knowledge of the topic.
> I now know why Type III SS are so questionable.
>
> As I understood form R FAQ, there is disagreement among
> Statisticians which SS to use
> (
http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-does-the-out
> put-from-anova_0028_0029-depend-on-the-order-of-factors-in-the
> -model_003f).
> However, most commercial statistical packages use Type III as
> the default (with orthogonal contrasts), just as STATISTICA,
> from which I am currently trying to migrate to R. This was
> probably was done for the convenience of end-users who are
> not very experienced in theoretical statistics.
>

Note that the contrasts are only orthogonal in the row basis of the model matrix, not, with unbalanced data, in the model matrix itself.

> I am aware that the same result could be produced using the standard
> anova() function with Type I "sequential" SS, supplemented by
> drop1() function, but this approach will look quite
> complicated for persons without any substantial background in
> statistics, like no-math students. I would prefer easier way,
> possibly more universal, though also probably more "for
> dummies" :) If am not mistaken, car package by John Fox with
> his nice Anova() function is the reasonable alternative for
> any, who wish to simply perform quick statistical analysis,
> without afraid to mess something with model fitting. Of
> course orthogonal contrasts have to be specified (for example
> contr.sum) in case of Type III SS.
>
> Therefore, I would like to reformulate my questions, to make
> it easier for you to answer:
>
> 1. The first question related to answer by Professor Brian
> Ripley: Did I understood correctly from the advised paper
> (Bill Venables'
> 'exegeses' paper) that there is not much sense to test main
> effects if the interaction is significant?
>

Many are of this opinion. I would put it a bit differently: Properly formulated, tests of main effects in the presence of interactions make sense (i.e., have a straightforward interpretation in terms of population marginal means) but probably are not of interest.

> 2. If I understood the post by John Fox correctly, I could safely use
> Anova(.,type="III") function from car for ANOVA analyses in
> R, both for balanced and unbalanced designs? Of course
> providing the model was fitted with orthogonal contrasts.
> Something like below:
> mod <- aov(response ~ factor1 * factor2, data=mydata,
> contrasts=list(factor1=contr.sum,
> factor2=contr.sum)) Anova(mod, type="III")
>

Yes (or you could reset the contrasts option), but why do you appear to prefer the "type-III" tests to the "type-II" tests?

> It was also said in most of your posts that the decision of
> which of Type of SS to use has to be done on the basis of the
> hypothesis we want to test. Therefore, let's assume that I
> would like to test the significance of both factors, and if
> some of them significant, I plan to use post-hoc tests to
> explore difference(s) between levels of this significant factor(s).
>

Your statement is too vague to imply what kind of tests you should use. I think that people are almost always interested in "main effects" when interactions to which they are marginal are negligible. In this situation, both "type-II" and "type-III" tests are appropriate, and "type-II" tests would usually be more powerful.

Regards,
John

> Thank you in advance, Amasco
>
> On 8/27/06, John Fox <jfox@mcmaster.ca> wrote:
> > Dear Amasco,
> >
> > A complete explanation of the issues that you raise is
> awkward in an
> > email, so I'll address your questions briefly. Section 8.2

> of my text,
> > Applied Regression Analysis, Linear Models, and Related

> Methods (Sage,
> > 1997) has a detailed discussion.
> >
> > (1) In balanced designs, so-called "Type I," "II," and
> "III" sums of
> > squares are identical. If the STATA manual says that Type
> II tests are
> > only appropriate in balanced designs, then that doesn't
> make a whole
> > lot of sense (unless one believes that Type-II tests are nonsense,
> > which is not the case).
> >
> > (2) One should concentrate not directly on different
> "types" of sums
> > of squares, but on the hypotheses to be tested. Sums of squares and
> > F-tests should follow from the hypotheses. Type-II and
> Type-III tests
> > (if the latter are properly formulated) test hypotheses that are
> > reasonably construed as tests of main effects and interactions in
> > unbalanced designs. In unbalanced designs, Type-I sums of squares
> > usually test hypotheses of interest only by accident.
> >
> > (3) Type-II sums of squares are constructed obeying the
> principle of
> > marginality, so the kinds of contrasts employed to
> represent factors
> > are irrelevant to the sums of squares produced. You get the same
> > answer for any full set of contrasts for each factor. In
> general, the
> > hypotheses tested assume that terms to which a particular term is

> > marginal are zero. So, for example, in a three-way ANOVA
> with factors
> > A, B, and C, the Type-II test for the AB interaction
> assumes that the
> > ABC interaction is absent, and the test for the A main
> effect assumes
> > that the ABC, AB, and AC interaction are absent (but not
> necessarily
> > the BC interaction, since the A main effect is not marginal to this
> > term). A general justification is that we're usually not
> interested,
> > e.g., in a main effect that's marginal to a nonzero interaction.
> >
> > (4) Type-III tests do not assume that terms higher-order to
> the term
> > in question are zero. For example, in a two-way design with
> factors A
> > and B, the type-III test for the A main effect tests whether the
> > population marginal means at the levels of A (i.e., averaged across
> > the levels of B) are the same. One can test this hypothesis
> whether or
> > not A and B interact, since the marginal means can be
> formed whether
> > or not the profiles of means for A within levels of B are parallel.
> > Whether the hypothesis is of interest in the presence of
> interaction
> > is another matter, however. To compute Type-III tests using
> > incremental F-tests, one needs contrasts that are orthogonal in the
> > row-basis of the model matrix. In R, this means, e.g., using

> > contr.sum, contr.helmert, or contr.poly (all of which will give you
> > the same SS), but not contr.treatment. Failing to be
> careful here will
> > result in testing hypotheses that are not reasonably
> construed, e.g., as hypotheses concerning main effects.
> >
> > (5) The same considerations apply to linear models that include
> > quantitative predictors -- e.g., ANCOVA. Most software will not
> > automatically produce sensible Type-III tests, however.
> >
> > I hope this helps,
> > John
> >
> > --------------------------------
> > John Fox
> > Department of Sociology
> > McMaster University
> > Hamilton, Ontario
> > Canada L8S 4M4
> > 905-525-9140x23604
> > http://socserv.mcmaster.ca/jfox
> > --------------------------------
> >
> > > -----Original Message-----
> > > From: r-help-bounces@stat.math.ethz.ch
> > > [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Amasco
> > > Miralisus
> > > Sent: Saturday, August 26, 2006 5:07 PM
> > > To: r-help@stat.math.ethz.ch
> > > Subject: [R] Type II and III sum of square in Anova (R,
> car package)
> > >
> > > Hello everybody,
> > >
> > > I have some questions on ANOVA in general and on ANOVA in R
> > > particularly.
> > > I am not Statistician, therefore I would be very
> appreciated if you
> > > answer it in a simple way.
> > >
> > > 1. First of all, more general question. Standard anova() function
> > > for lm() or aov() models in R implements Type I sum of squares
> > > (sequential), which is not well suited for unbalanced ANOVA.
> > > Therefore it is better to use
> > > Anova() function from car package, which was programmed
> by John Fox
> > > to use Type II and Type III sum of squares. Did I get the point?
> > >
> > > 2. Now more specific question. Type II sum of squares is not well
> > > suited for unbalanced ANOVA designs too (as stated in STATISTICA
> > > help), therefore the general rule of thumb is to use Anova()
> > > function using Type II SS only for balanced ANOVA and Anova()
> > > function using Type III SS for unbalanced ANOVA?
> > > Is this correct interpretation?
> > >
> > > 3. I have found a post from John Fox in which he wrote
> that Type III
> > > SS could be misleading in case someone use some
> contrasts. What is
> > > this about?
> > > Could you please advice, when it is appropriate to use
> Type II and
> > > when Type III SS? I do not use contrasts for comparisons, just
> > > general ANOVA with subsequent Tukey post-hoc comparisons.
> > >
> > > Thank you in advance,
> > > Amasco
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help@stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue Aug 29 09:26:19 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 29 Aug 2006 - 10:25:18 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.