Re: [R] memory limit in aov

From: Liaw, Andy <andy_liaw_at_merck.com>
Date: Fri 03 Feb 2006 - 01:34:12 EST


I don't know what the goal of the analysis is, but I have a suspicion that the `gbm' package might be a more fruitful way...

Cheers,
Andy

From: Lucy Crooks
>
> Thanks for your reply.
>
> Thanks for info on aov-hadn't been able to tell which to use from
> help pages. There are no random effects so will switch to lm().
>
> The data are amino acid sequences, with factor being position and
> level which amino acid is present. There are indeed an average of
> around 8 per position (from 2 to 20). I don't think I can collapse
> the levels at least to start with as I don't know in advance which
> effect fitness (the y variable).
>
> From what you say R should be able to do the smaller analysis. So
> have increased the RAM and will try this again.
>
> Lucy Crooks
>
> On Feb 1, 2006, at 3:45 PM, Peter Dalgaard wrote:
> > You do not want to use aov() on unbalanced data, and
> especially not on
> > large data sets if random effects are involved. Rather, you need to
> > look at lmer() or just lm() if no random effects are present.
> >
> > However, even so, if you really have 29025 parameters to estimate, I
> > think you're out of luck. 8 billion (US) elements is 64G
> and R is not
> > able to handle objects of that size - the limit is that the
> size must
> > fit in a 32 bit integer (about 2 billion elements).
> >
> > A quick calculation suggests that your factors have around 8 levels
> > each. Is that really necessary, or can you perhaps collapse some
> > levels?
> >
> >
> >
> > --
> > O__ ---- Peter Dalgaard ุster Farimagsgade 5, Entr.B
> > c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
> > (*) \(*) -- University of Copenhagen Denmark
> Ph: (+45)
> > 35327918
> > ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)
> FAX: (+45)
> > 35327907
>
>
> > Lucy Crooks <Lucy.Crooks@env.ethz.ch> writes:
> >> I want to do an unbalanced anova on 272,992 observations with 405
> >> factors including 2-way interactions between 1 of these factors and
> >> the other 404. After fitting only 11 factors and their
> interactions I
> >> get error messages like:
> >>
> >> Error: cannot allocate vector of size 1433066 Kb
> >> R(365,0xa000ed68) malloc: *** vm_allocate(size=1467461632) failed
> >> (error code=3)
> >> R(365,0xa000ed68) malloc: *** error: can't allocate region
> >> R(365,0xa000ed68) malloc: *** set a breakpoint in szone_error to
> >> debug
> >>
> >> I think that the anova involves a matrix of 272,992 rows by 29025
> >> columns (using dummy variables)=7,900 million elements. I realise
> >> this is a lot! Could I solve this if I had more RAM or is
> it just too
> >> big?
> >>
> >> Another possibility is to do 16 separate analyses on 17,062
> >> observations with 404 factors (although statistically I think the
> >> first approach is preferable). I get similar error messages then:
> >>
> >> Error: cannot allocate vector of size 175685 Kb
> >> R(365,0xa000ed68) malloc: *** vm_allocate(size=179904512) failed
> >> (error code=3)
> >>
> >> I think this analysis requires a 31 million element matrix.
> >>
> >> I am using R version 2.2.1 on a Mac G5 with 1 GB RAM running OS
> >> 10.4.4. Can somebody tell me what the limitations of my machine (or
> >> R) are likely to be? Whether this smaller analysis is feasible? and
> >> if so how much more memory I might require?
> >>
> >> The data is in R in a data frame of 272,992 rows by 406 columns. I
> >> would really appreciate any helpful input.
> >>
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Feb 03 01:45:31 2006

This archive was generated by hypermail 2.1.8 : Fri 03 Feb 2006 - 14:38:04 EST