Re: [R] memory problems when combining randomForests

From: Liaw, Andy <>
Date: Fri 28 Jul 2006 - 04:59:30 EST

From: Eleni Rapsomaniki
> I'm using R (windows) version 2.1.1, randomForest version 4.15.


Never seen such a version...

> I call randomForest like this:
> my.rf=randomForest(x=train.df[,-response_index],
> y=train.df[,response_index],
> xtest=test.df[,-response_index],
> ytest=test.df[,response_index],
> importance=TRUE,proximity=FALSE, keep.forest=TRUE)
> (where train.df and test.df are my train and test
> data.frames and response_index is the column number
> specifiying the class)
> I then save each tree to a file so I can combine them all
> afterwards. There are no memory issues when
> keep.forest=FALSE. But I think that's the bit I need for
> future predictions (right?).

Yes, but what is your question? (Do you mean each *forest*, instead of each *tree*?)  

> I did check previous messages on memory issues, and thought
> that combining the trees afterwards would solve the problem.
> Since my cross-validation subsets give me a fairly stable
> error-rate, I suppose I could just use a randomForest trained
> on just a subset of my data. But would I not be "wasting"
> data this way?

Perhaps, but see Jerry Friedman's ISLE, where he argued that RF with very small trees grown on small random samples can give even better results some of the times.  

> A bit off the subject, but should the order at which at rows
> (ie. sets of explanatory variables) are passed to the
> randomForest function affect the result? I have noticed that
> if I pick a random unordered sample from my control data for
> training the error rate is much lower than if I a take an
> ordered sample. This remains true for all my cross-validation
> results.

I'm not sure I understand. In randomForest() (as in other functions) variables are in columns, rather than rows, so are you talking about variables (columns) in different order or data (rows) in different order?


> I'm sorry for my many questions.
> Many Thanks
> Eleni Rapsomaniki
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
> mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Fri Jul 28 05:07:41 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Fri 28 Jul 2006 - 20:18:00 EST.

Mailing list information is available at Please read the posting guide before posting to the list.