Re: [R] Result depends on order of factors in unbalanced designs (lme, anova)?

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Fri, 22 Jun 2007 16:26:57 +0100 (BST)

'anova' is rather a misnomer here. In terms of the description in ?anova.lme, you have

      When only one fitted model object is present, a data frame with
      the sums of squares, numerator degrees of freedom, denominator
      degrees of freedom, F-values, and P-values for Wald tests for the
      terms in the model ...

but there are no 'sums of squares' shown. However, the crucial part of that help page is

     type: an optional character string specifying the type of sum of
           squares to be used in F-tests for the terms in the model. If
           '"sequential"', the sequential sum of squares obtained by
           including the terms in the order they appear in the model is
           used; else, if '"marginal"', the marginal sum of squares
           obtained by deleting a term from the model at a time is used.
           This argument is only used when a single fitted object is
           passed to the function. Partial matching of arguments is
           used, so only the first character needs to be provided.
           Defaults to '"sequential"'.

so these are sequential fits (just like anova for an lm fit), and yes, sequential fits do in general depend on the sequence of terms.

The issues of interpretation are exactly those of unbalanced linear models, and you will find advice on that in many places, e.g. in MASS.

On Thu, 21 Jun 2007, Karl Knoblick wrote:

> Dear R-Community!
>
> For example I have a study with 4 treatment groups (10 subjects per group) and 4 visits. Additionally, the gender is taken into account. I think - and hope this is a goog idea (!) - this data can be analysed using lme as below.
>
> In a balanced design everything is fine, but in an unbalanced design there are differences depending on fitting y~visit*treat*gender or y~gender*visit*treat - at least with anova (see example). Does this make sense? Which ordering might be the correct one?
>
> Here the example script:
> library(nlme)
> set.seed(123)
> # Random generation of data:
> NSubj<-40 # No. of subjects
> set.seed(1234)
> id<-factor(rep(c(1:NSubj),4)) # ID of subjects
> treat<-factor(rep(rep(1:4,each=5),4)) # Treatment 4 Levels
> gender<-factor(rep(rep(1:2, each=20),4))
> visit<-factor(rep(1:4, each=NSubj))
> y<-runif(4*NSubj) # Results
> # Add effects
> y<-y+0.01*as.integer(visit)
> y<-y+0.02*as.integer(gender)
> y<-y+0.024*as.integer(treat)
> df<-data.frame(id, treat, gender, visit, y)
> # groupedData object for lme
> gdat<-groupedData(y ~ visit|id, data=df)
> # fits - different ordering of factors
> fit1<-lme(y ~ visit*treat*gender, data=gdat, random = ~visit|id)
> anova(fit1)
> fit2<-lme(y ~ gender*treat*visit, data=gdat, random = ~visit|id)
> anova(fit2)
> # Result: identical (balanced design so far), ok
> # Now change gender of subject 1
> gdat$gender[c(1,41,81,121)]<-2
> # onece more fits with different ordering of factors
> fit1<-lme(y ~ visit*treat*gender, data=gdat, random = ~visit|id)
> anova(fit1)
> fit2<-lme(y ~ gender*treat*visit, data=gdat, random = ~visit|id)
> anova(fit2)
> # Result: There are differences!!
>
> Hope anybody can help or give me advice how to interpret these results correctly or how to avoid this problem! Is there a better possibility to analyse these data than lme?
>
> Thanks!
> Karl

-- 
Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Fri 22 Jun 2007 - 15:38:46 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 22 Jun 2007 - 16:32:10 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.