From: Liaw, Andy <andy_liaw_at_merck.com>

Date: Fri 03 Feb 2006 - 01:34:12 EST

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Feb 03 01:45:31 2006

Date: Fri 03 Feb 2006 - 01:34:12 EST

I don't know what the goal of the analysis is, but I have a suspicion that
the `gbm' package might be a more fruitful way...

Cheers,

Andy

From: Lucy Crooks

*>
*

> Thanks for your reply.

*>
**> Thanks for info on aov-hadn't been able to tell which to use from
**> help pages. There are no random effects so will switch to lm().
**>
**> The data are amino acid sequences, with factor being position and
**> level which amino acid is present. There are indeed an average of
**> around 8 per position (from 2 to 20). I don't think I can collapse
**> the levels at least to start with as I don't know in advance which
**> effect fitness (the y variable).
**>
**> From what you say R should be able to do the smaller analysis. So
**> have increased the RAM and will try this again.
**>
**> Lucy Crooks
**>
**> On Feb 1, 2006, at 3:45 PM, Peter Dalgaard wrote:
**> > You do not want to use aov() on unbalanced data, and
**> especially not on
**> > large data sets if random effects are involved. Rather, you need to
**> > look at lmer() or just lm() if no random effects are present.
**> >
**> > However, even so, if you really have 29025 parameters to estimate, I
**> > think you're out of luck. 8 billion (US) elements is 64G
**> and R is not
**> > able to handle objects of that size - the limit is that the
**> size must
**> > fit in a 32 bit integer (about 2 billion elements).
**> >
**> > A quick calculation suggests that your factors have around 8 levels
**> > each. Is that really necessary, or can you perhaps collapse some
**> > levels?
**> >
**> >
**> >
**> > --
**> > O__ ---- Peter Dalgaard ุster Farimagsgade 5, Entr.B
**> > c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
**> > (*) \(*) -- University of Copenhagen Denmark
**> Ph: (+45)
**> > 35327918
**> > ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)
**> FAX: (+45)
**> > 35327907
**>
**>
**> > Lucy Crooks <Lucy.Crooks@env.ethz.ch> writes:
**> >> I want to do an unbalanced anova on 272,992 observations with 405
**> >> factors including 2-way interactions between 1 of these factors and
**> >> the other 404. After fitting only 11 factors and their
**> interactions I
**> >> get error messages like:
**> >>
**> >> Error: cannot allocate vector of size 1433066 Kb
**> >> R(365,0xa000ed68) malloc: *** vm_allocate(size=1467461632) failed
**> >> (error code=3)
**> >> R(365,0xa000ed68) malloc: *** error: can't allocate region
**> >> R(365,0xa000ed68) malloc: *** set a breakpoint in szone_error to
**> >> debug
**> >>
**> >> I think that the anova involves a matrix of 272,992 rows by 29025
**> >> columns (using dummy variables)=7,900 million elements. I realise
**> >> this is a lot! Could I solve this if I had more RAM or is
**> it just too
**> >> big?
**> >>
**> >> Another possibility is to do 16 separate analyses on 17,062
**> >> observations with 404 factors (although statistically I think the
**> >> first approach is preferable). I get similar error messages then:
**> >>
**> >> Error: cannot allocate vector of size 175685 Kb
**> >> R(365,0xa000ed68) malloc: *** vm_allocate(size=179904512) failed
**> >> (error code=3)
**> >>
**> >> I think this analysis requires a 31 million element matrix.
**> >>
**> >> I am using R version 2.2.1 on a Mac G5 with 1 GB RAM running OS
**> >> 10.4.4. Can somebody tell me what the limitations of my machine (or
**> >> R) are likely to be? Whether this smaller analysis is feasible? and
**> >> if so how much more memory I might require?
**> >>
**> >> The data is in R in a data frame of 272,992 rows by 406 columns. I
**> >> would really appreciate any helpful input.
**> >>
**>
**> ______________________________________________
**> R-help@stat.math.ethz.ch mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide!
**> http://www.R-project.org/posting-guide.html
**>
*

>

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Feb 03 01:45:31 2006

*
This archive was generated by hypermail 2.1.8
: Fri 03 Feb 2006 - 14:38:04 EST
*