Re: [R] memory problems when combining randomForests [Broadcast]

From: Eleni Rapsomaniki <>
Date: Fri 28 Jul 2006 - 01:07:55 EST

I'm using R (windows) version 2.1.1, randomForest version 4.15. I call randomForest like this:

my.rf=randomForest(x=train.df[,-response_index], y=train.df[,response_index],  xtest=test.df[,-response_index], ytest=test.df[,response_index],  importance=TRUE,proximity=FALSE, keep.forest=TRUE)

 (where train.df and test.df are my train and test data.frames and  response_index is the column number specifiying the class)

I then save each tree to a file so I can combine them all afterwards. There are no memory issues when keep.forest=FALSE. But I think that's the bit I need for future predictions (right?).

I did check previous messages on memory issues, and thought that combining the trees afterwards would solve the problem. Since my cross-validation subsets give me a fairly stable error-rate, I suppose I could just use a randomForest trained on just a subset of my data. But would I not be "wasting" data this way?

A bit off the subject, but should the order at which at rows (ie. sets of explanatory variables) are passed to the randomForest function affect the result? I have noticed that if I pick a random unordered sample from my control data for training the error rate is much lower than if I a take an ordered sample. This remains true for all my cross-validation results.

I'm sorry for my many questions.
Many Thanks
Eleni Rapsomaniki mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Fri Jul 28 01:14:54 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Fri 28 Jul 2006 - 02:21:19 EST.

Mailing list information is available at Please read the posting guide before posting to the list.