Re: [R] randomForest and missing data

From: Weiwei Shi <helprhelp_at_gmail.com>
Date: Thu 04 Jan 2007 - 22:50:20 GMT

You can try randomForest in Fortran codes, which has that function doing missing replacement automatically. There are two ways of imputations (one is fast and the other is time-consuming) to do that. I did it long time ago.

the link is below. If you have any question, just let me know. http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm

In principle, each individual tree is NOT a cart tree since each splitting predictor is randomly selected. In my impression, rf is more like nearest neighbor algorithm. The surrogation is NOT used in rf implementation. That's "why" you have to impute it before using it; while the imputation is not implemented in r-version, in my best knowledge.
You can check that from reading the original technical report or some presentation by original authors. I remember there was some slide comparing rf and CART somewhere.

HTH, weiwei

On 1/4/07, Darin A. England <england@cs.umn.edu> wrote:
>
> Does anyone know a reason why, in principle, a call to randomForest
> cannot accept a data frame with missing predictor values? If each
> individual tree is built using CART, then it seems like this
> should be possible. (I understand that one may impute missing values
> using rfImpute or some other method, but I would like to avoid doing
> that.)
>
> If this functionality were available, then when the trees are being
> constructed and when subsequent data are put through the forest, one
> would also specify an argument for the use of surrogate rules, just
> like in rpart.
>
> I realize this question is very specific to randomForest, as opposed
> to R in general, but any comments are appreciated. I suppose I am
> looking for someone to say "It's not appropriate, and here's why
> ..." or "Good idea. Please implement and post your code."
>
> Thanks,
>
> Darin England, Senior Scientist
> Ingenix
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Fri Jan 05 10:35:47 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Fri 05 Jan 2007 - 00:30:25 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.