Re: [R] memory problems when combining randomForests

From: Eleni Rapsomaniki <>
Date: Tue 01 Aug 2006 - 02:45:37 EST

Hi Andy,

> > I get different order of importance for my variables depending on their
order in the training data.

Perhaps answering my own question, the change in importance rankings could be attributed to the fact that before passing my data to randomForest I impute the missing values randomly (using the combined distributions of pos+neg), so the data seen by RF is slightly different. Then combining this with the fact that RF chooses data randomly it makes sense to see different rankings.

In a previous thread regarding simplifying variables:

you say:
"The basic problem is that when you select important variables by RF and then re-run RF with those variables, the OOB error rate become biased downward. As you iterate more times, the "overfitting" becomes more and more severe (in the sense that, the OOB error rate will keep decreasing while error rate on an independent test set will be flat or increases)"

But if every time you remove a variable you pass some test data (ie data not used to train the model) and base the performance of the new, reduced model on the error rate on the confusion matrix for the test data, then this "overfitting" should not be an issue, right? (unless of course you were referring to unsupervised learning).

Best regards
Eleni Rapsomaniki
Birkbeck College, UK mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Tue Aug 01 03:57:08 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 01 Aug 2006 - 20:17:27 EST.

Mailing list information is available at Please read the posting guide before posting to the list.