Re: [R] memory problems when combining randomForests

From: Liaw, Andy <>
Date: Fri 28 Jul 2006 - 23:28:09 EST

From: Eleni Rapsomaniki
> Hi Andy,
> > > I'm using R (windows) version 2.1.1, randomForest version 4.15.
> > ^^^^^^^^^^^^^^^^^^^^^^^^^
> > Never seen such a version...
> Ooops! I meant 4.5-15
> > > I then save each tree to a file so I can combine them all
> > > afterwards. There are no memory issues when
> > > keep.forest=FALSE. But I think that's the bit I need for
> > > future predictions (right?).
> >
> > Yes, but what is your question? (Do you mean each *forest*,
> > instead of each *tree*?)
> I mean the component of the object that is created from
> randomForest that has
> the name "forest" (and takes up all the memory!).

Yes, the forest can take up quite a bit of space. You might consider setting nodesize larger and see if that gives you sufficient space saving w/o compromising prediction performance.  

> > > A bit off the subject, but should the order at which at rows
> > > (ie. sets of explanatory variables) are passed to the
> > > randomForest function affect the result? I have noticed that
> > > if I pick a random unordered sample from my control data for
> > > training the error rate is much lower than if I a take an
> > > ordered sample. This remains true for all my cross-validation
> > > results.
> >
> > I'm not sure I understand. In randomForest() (as in other
> > functions) variables are in columns, rather than rows, so
> > are you talking about variables (columns) in different order
> > or data (rows) in different order?
> Yes, sorry I confused you. I mean the order at which data
> (rows) is passed, not
> columns.

Then I'm not sure what you mean by difference in performance, even in cross-validation. Perhaps you can show some example? Each tree in the forest is grown on a random sample from the data, so the order of the row can not matter.

> Finally, I see from
> that there is a component in Breiman's implementation of
> randomForest that
> computes interactions between parameters. Has this been
> implemented in R yet?

No. Prof. Breiman told me that is very experimental, and he wouldn't mind if that doesn't make it into the R package. Since I have other priorities for the package, that naturally went to the backburner.


> Many thanks for your time and help.
> Eleni Rapsomaniki
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
> mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Sat Jul 29 00:50:18 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Mon 31 Jul 2006 - 22:16:50 EST.

Mailing list information is available at Please read the posting guide before posting to the list.