Re: [R] aov error with large data set

From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>
Date: Wed, 16 Jul 2008 20:27:26 +0200

Mike Lawrence wrote:
> I'm looking to analyze a large data set: a within-Ss 2*2*1500 design
> with 20 Ss. However, aov() gives me an error, reproducible as follows:
>
> id = factor(1:20)
> a = factor(1:2)
> b = factor(1:2)
> d = factor(1:1500)
> temp = expand.grid(id=id, a=a, b=b, d=d)
> temp$y = rnorm(length(temp[, 1])) #generate some random DV data
> this_aov = aov(
> y~a*b*d+Error(id/(a*b*d))
> , data=temp
> )
>
> While yields the following error:
> "
> Error in model.matrix.default(mt, mf, contrasts) :
> allocMatrix: too many elements specified
> "
>
> Any suggestions?
>
This is an inherent weakness of aov(), or at least the current implementation thereof. You end up fitting a set of linear models with a huge number of parameters, in order to get the separation into strata. The column dimensions of the design matrices are the number of random effects, and if you have 60000 of those, you run out of storage. (As written, you even have 120000=20*2*2*1500 for the id*a*b*d term, but removing it isn't really going to help.)

(30 years ago, a much more efficient algorithm was implemented in Genstat, but we seem to be short of volunteers to reimplement it...)

Ideas? Here are three:

lme4 should be able to handle such designs. It won't get the df for the F tests, but you could work them out by hand.

or, you could try recasting as a multivariate lm problem (see my recent R News paper). This is still pretty huge, but this time the limiting quantity is the 6000*6000 empirical covariance matrix, which could be manageable.

or, the most efficient way, but much more work for you: Generate the relevant tables of means and residuals; e.g. by placing your date in a 20*2*2*1500 table and using the relevant combinations of apply() and sweep(). These can be used to generate the relevant sums of squares.
> Mike
>
> --
> Mike Lawrence
> Graduate Student, Department of Psychology, Dalhousie University
>
> www.memetic.ca
>
> "The road to wisdom? Well, it's plain and simple to express:
> Err and err and err again, but less and less and less."
> - Piet Hein
>
"Problems worthy of attack, prove their worth by hitting back" - Piet Hein

-- 
   O__  ---- Peter Dalgaard             ุster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard_at_biostat.ku.dk)              FAX: (+45) 35327907

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Wed 16 Jul 2008 - 18:30:23 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 16 Jul 2008 - 18:32:02 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive