Re: [R] unbalanced one-way ANOVA

From: Douglas Bates <>
Date: Fri, 29 Feb 2008 08:38:14 -0600

On Fri, Feb 29, 2008 at 4:47 AM, Nauta, A.L. <> wrote:

> Thank you for your reply,
> is your answer (that the approach does not depend on balance in the data)
> only valid for one-way anova, or also for two-way or more-way anova?

Any kind.

You should be aware that for unbalanced data sets the sum of squares attributed to a term depends on the order in which the terms occur in the model. That is, the sum of squares and the F-ratios and the p-values for, say, factor A will be different if you fit a model

y ~ A + B

versus the model

y ~ B + A

to a data set where factors A and B are unbalanced.

This is because the sums of squares displayed by R's anova methods are the sequential sums of squares. Although other statistical software may calculate other, more exotic, types of sums of squares, many of us would argue that these are the only ones that make sense.

If in doubt about which sum of squares to use, the general rule is that you should only pay attention to the F ratio and p-value for the last term in the model.

> ________________________________
> From: on behalf of Douglas Bates
> Sent: Fri 29-2-2008 0:39
> To: Nauta, A.L.
> Cc:
> Subject: Re: [R] unbalanced one-way ANOVA
> On Thu, Feb 28, 2008 at 7:52 AM, Nauta, A.L. <>
> wrote:
> > Hi,
> > I have an unbalanced dataset on which I would like to perform a one-way
> anova test using R (aov). According to Wannacott and Wannacott (1990) p.
> 333, one-way anova with unbalanced data is possible with a few modifications
> in the anova-calculations. The modified anova calculations should take into
> account different sample sizes and a modified definition of the average. I
> was wondering if the aov-function in R is suitable for one-way anova on
> unbalanced data.
> Yes.
> The analysis of variance is performed in R by fitting a linear model
> created from indicator variables for the levels of the factor. This
> validity of this approach does not depend on balance in the data.
> The formulas given in an introductory textbook are almost never the
> way that results are computed in practice. I think we would all be
> better off if they didn't even give these misleading formulas.
> mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Fri 29 Feb 2008 - 14:48:07 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 29 Feb 2008 - 15:30:18 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive