[R] slow computation of mixed ANOVA using aov

From: Steven Lacey <slacey_at_umich.edu>
Date: Sat 19 Mar 2005 - 03:10:59 EST


Dear R-help list,  

I am trying to do a mixed ANOVA on a 8960 x 5 dataframe. I have 3 factors for which I want to test all main effects and interactions : f1 (40 levels), f2 (7 levels), and f3 (4 levels). I also have a subject factor, subject, and a dependent variable, dv.  

Some more information about the factors: f2 is a between-subject factor. That is, for each level of f2 there are 8 nested levels of the subject factor. For example, levels 1-8 of subject are nested in level 1 of f2. Levels 9-16 of subject are nested in level 2 of f2. In other words, the subjects that participated in any level of f2 are different from the subjects that participated in any other level of f2.  

In contrast, f1 and f3 are within-subject factors. That is, for any one of the 56 subjects, we have a 160 medians corresponding to each condition from a complete crossing of factors f1 and f2. While it is true that we do have replicate observations for any subject in each of these conditions, we take the median of those values and operate as if there is only a single observation for each subject in each of the 160 within-subject conditions.  

Below is code that will generate dataframe d, which is just like the one I am working with:  

f1<-gl(40,1,8960,ordered=T)
f2<-gl(7,1280,8960,ordered=T)
f3<-gl(4,40,8960,ordered=T)

subject<-gl(56,160,8960,ordered=T)
dv<-rnorm(8960,mean=500,sd=50)
d <- data.frame(f1,f2,f3,f4,dv)
 

To run the mixed ANOVA I use the following call (modeled after J. Baron): aov(dv~f1*f2*f3+Error(subject/(f1*f2)),data=d)

WARNING: Exert caution when running the aov command. I have run the exact same command on Windows and Unix machines (each with 1GB of RAM; allocated up to 3 or 4GB of memory for the analysis ) and it has taken many, many hours to finish. That said, this is not a new problem posted on the R-help list. There are several posts where analysts have run into similar problems. My general impression of these posts, and correct me if I am wrong, is that because aov is a wrapper around lm, the extra time is required to build and manipulate a design matrix (via qr decomposition) that is 8960 x several thousand columns large! Is that so? It seems fitting because if I call aov with only a single factor, then it returns in a few seconds.  

In order to test if the computational slowness was something unique to R, I ran the same analysis, including all 3 factors, in SPSS. To my surprise SPSS returned almost instantaneously. I do not know much about the algorithm in SPSS, but I suspect it may be calculating condition means and sums of squares rather than generating a design matrix. Does that sound plausible?  

At this point I am a dedicated R user. However, I do the kind of analysis described above quite often. It is important that my statistical package be able to handle it more efficiently than what I have been able to get R to do at this point. Am I doing anything obviously wrong? Is there a method in R that more closely resembles the algorithm used in SPSS? If not, are there any other methods R has to do these kind of analyses? Could I split up the analysis in some principled way to ease the processing demand on R?  

Thanks in advanvce for any further insight into this issue, Steve Lacey

        [[alternative HTML version deleted]]



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sat Mar 19 03:14:49 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:30:51 EST