# Re: [R] unbalanced one-way ANOVA

From: Douglas Bates <bates_at_stat.wisc.edu>
Date: Fri, 29 Feb 2008 10:51:04 -0600

On Fri, Feb 29, 2008 at 10:32 AM, Nauta, A.L. <A.L.Nauta_at_students.uu.nl> wrote:

> I tried a 6-way anova, and indeed found out that changing the order of
> factors influences the SS, F-ratio's and p-values. So what should I do if I
> want to know which factor most strongly rejects H0? (H0 is the hypothese of
> "no difference" in the population means) Should I better do 6 one-way
> anova's (on each factor) and then compare the p-values?

No.

If you are going to try to perform a 6-way anova on an unbalanced data set you should read more about the analysis of variance so that you can understand the model and the hypotheses involved or ask a statistical consultant. This is not a topic that can be explained in a couple of email messages.

You may find Bill Venables paper "Exegeses on Linear Models" (do an internet search on the title to find a copy) a good starting point.

> ________________________________
>
> From: dmbates_at_gmail.com on behalf of Douglas Bates
> Sent: Fri 29-2-2008 15:38
> To: Nauta, A.L.
> Cc: R Help
>
>
> Subject: Re: [R] unbalanced one-way ANOVA
>
>
>
>
>
> On Fri, Feb 29, 2008 at 4:47 AM, Nauta, A.L. <A.L.Nauta_at_students.uu.nl>
> wrote:
>
> > Thank you for your reply,
> > is your answer (that the approach does not depend on balance in the data)
> > only valid for one-way anova, or also for two-way or more-way anova?
>
> Any kind.
>
> You should be aware that for unbalanced data sets the sum of squares
> attributed to a term depends on the order in which the terms occur in
> the model. That is, the sum of squares and the F-ratios and the
> p-values for, say, factor A will be different if you fit a model
>
> y ~ A + B
>
> versus the model
>
> y ~ B + A
>
> to a data set where factors A and B are unbalanced.
>
> This is because the sums of squares displayed by R's anova methods are
> the sequential sums of squares. Although other statistical software
> may calculate other, more exotic, types of sums of squares, many of us
> would argue that these are the only ones that make sense.
>
> If in doubt about which sum of squares to use, the general rule is
> that you should only pay attention to the F ratio and p-value for the
> last term in the model.
>
> > ________________________________
> > From: dmbates_at_gmail.com on behalf of Douglas Bates
> > Sent: Fri 29-2-2008 0:39
> > To: Nauta, A.L.
> > Cc: r-help_at_r-project.org
> > Subject: Re: [R] unbalanced one-way ANOVA
> >
> >
> >
> >
> >
> > On Thu, Feb 28, 2008 at 7:52 AM, Nauta, A.L. <A.L.Nauta_at_students.uu.nl>
> > wrote:
> > > Hi,
> >
> > > I have an unbalanced dataset on which I would like to perform a one-way
> > anova test using R (aov). According to Wannacott and Wannacott (1990) p.
> > 333, one-way anova with unbalanced data is possible with a few
> modifications
> > in the anova-calculations. The modified anova calculations should take
> into
> > account different sample sizes and a modified definition of the average. I
> > was wondering if the aov-function in R is suitable for one-way anova on
> > unbalanced data.
> >
> > Yes.
> >
> > The analysis of variance is performed in R by fitting a linear model
> > created from indicator variables for the levels of the factor. This
> > validity of this approach does not depend on balance in the data.
> >
> > The formulas given in an introductory textbook are almost never the
> > way that results are computed in practice. I think we would all be
> > better off if they didn't even give these misleading formulas.
> >
>

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 29 Feb 2008 - 16:53:28 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 29 Feb 2008 - 17:30:18 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.