From: Lucy Crooks <Lucy.Crooks_at_env.ethz.ch>

Date: Thu 02 Feb 2006 - 21:35:03 EST

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Feb 02 21:43:06 2006

Date: Thu 02 Feb 2006 - 21:35:03 EST

On Feb 1, 2006, at 3:45 PM, Peter Dalgaard wrote:

> You do not want to use aov() on unbalanced data, and especially not on

*> large data sets if random effects are involved. Rather, you need to
**> look at lmer() or just lm() if no random effects are present.
**>
**> However, even so, if you really have 29025 parameters to estimate, I
**> think you're out of luck. 8 billion (US) elements is 64G and R is not
**> able to handle objects of that size - the limit is that the size must
**> fit in a 32 bit integer (about 2 billion elements).
**>
**> A quick calculation suggests that your factors have around 8 levels
**> each. Is that really necessary, or can you perhaps collapse some
**> levels?
**>
**>
**>
**> --
**> O__ ---- Peter Dalgaard ุster Farimagsgade 5, Entr.B
**> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
**> (*) \(*) -- University of Copenhagen Denmark Ph: (+45)
**> 35327918
**> ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45)
**> 35327907
*

> Lucy Crooks <Lucy.Crooks@env.ethz.ch> writes:

*>> I want to do an unbalanced anova on 272,992 observations with 405
**>> factors including 2-way interactions between 1 of these factors and
**>> the other 404. After fitting only 11 factors and their interactions I
**>> get error messages like:
**>>
**>> Error: cannot allocate vector of size 1433066 Kb
**>> R(365,0xa000ed68) malloc: *** vm_allocate(size=1467461632) failed
**>> (error code=3)
**>> R(365,0xa000ed68) malloc: *** error: can't allocate region
**>> R(365,0xa000ed68) malloc: *** set a breakpoint in szone_error to
**>> debug
**>>
**>> I think that the anova involves a matrix of 272,992 rows by 29025
**>> columns (using dummy variables)=7,900 million elements. I realise
**>> this is a lot! Could I solve this if I had more RAM or is it just too
**>> big?
**>>
**>> Another possibility is to do 16 separate analyses on 17,062
**>> observations with 404 factors (although statistically I think the
**>> first approach is preferable). I get similar error messages then:
**>>
**>> Error: cannot allocate vector of size 175685 Kb
**>> R(365,0xa000ed68) malloc: *** vm_allocate(size=179904512) failed
**>> (error code=3)
**>>
**>> I think this analysis requires a 31 million element matrix.
**>>
**>> I am using R version 2.2.1 on a Mac G5 with 1 GB RAM running OS
**>> 10.4.4. Can somebody tell me what the limitations of my machine (or
**>> R) are likely to be? Whether this smaller analysis is feasible? and
**>> if so how much more memory I might require?
**>>
**>> The data is in R in a data frame of 272,992 rows by 406 columns. I
**>> would really appreciate any helpful input.
**>>
*

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Feb 02 21:43:06 2006

*
This archive was generated by hypermail 2.1.8
: Fri 03 Feb 2006 - 02:23:22 EST
*