Date: Thu 02 Feb 2006 - 21:35:03 EST

On Feb 1, 2006, at 3:45 PM, Peter Dalgaard wrote:

You do not want to use aov() on unbalanced data, and especially not on

large data sets if random effects are involved. Rather, you need to
look at lmer() or just lm() if no random effects are present.
**>
However, even so, if you really have 29025 parameters to estimate, I
think you're out of luck. 8 billion (US) elements is 64G and R is not
able to handle objects of that size - the limit is that the size must
fit in a 32 bit integer (about 2 billion elements).
**>
A quick calculation suggests that your factors have around 8 levels
each. Is that really necessary, or can you perhaps collapse some
levels?
**>
**>
**>
Lucy Crooks <Lucy.Crooks@env.ethz.ch> writes:

I want to do an unbalanced anova on 272,992 observations with 405
factors including 2-way interactions between 1 of these factors and
the other 404. After fitting only 11 factors and their interactions I
get error messages like:
**>>
Error: cannot allocate vector of size 1433066 Kb
R(365,0xa000ed68) malloc: *** vm_allocate(size=1467461632) failed
(error code=3)
R(365,0xa000ed68) malloc: *** error: can't allocate region
R(365,0xa000ed68) malloc: *** set a breakpoint in szone_error to
debug
**>>
I think that the anova involves a matrix of 272,992 rows by 29025
columns (using dummy variables)=7,900 million elements. I realise
this is a lot! Could I solve this if I had more RAM or is it just too
big?
**>>
Another possibility is to do 16 separate analyses on 17,062
observations with 404 factors (although statistically I think the
first approach is preferable). I get similar error messages then:
**>>
Error: cannot allocate vector of size 175685 Kb
R(365,0xa000ed68) malloc: *** vm_allocate(size=179904512) failed
(error code=3)
**>>
I think this analysis requires a 31 million element matrix.
**>>
I am using R version 2.2.1 on a Mac G5 with 1 GB RAM running OS
10.4.4. Can somebody tell me what the limitations of my machine (or
R) are likely to be? Whether this smaller analysis is feasible? and
if so how much more memory I might require?
**>>
The data is in R in a data frame of 272,992 rows by 406 columns. I
would really appreciate any helpful input.
**>>
