Re: [R] randomForest and missing data

From: Sicotte, Hugues Ph.D. <Sicotte.Hugues_at_mayo.edu>
Date: Thu 04 Jan 2007 - 21:44:27 GMT


I don't know about this module, but a general answer is that if you have missing data, it may affect your model. If your data is missing at random, then you might be lucky in your model building.

If however your data was not missing at random (e.g. censoring) , you might build a wrong predictor.

Missing at random or not, that is a question you should answer and deal with before modeling.

I refer you to a book like
"Analysis of Incomplete Multivariate data". By Schafer

If there is a way around that with randomForest, I'd be interested to know too.

Hugues Sicotte

-----Original Message-----
From: r-help-bounces@stat.math.ethz.ch
[mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Darin A. England Sent: Thursday, January 04, 2007 3:13 PM To: r-help@stat.math.ethz.ch
Subject: [R] randomForest and missing data

Does anyone know a reason why, in principle, a call to randomForest cannot accept a data frame with missing predictor values? If each individual tree is built using CART, then it seems like this should be possible. (I understand that one may impute missing values using rfImpute or some other method, but I would like to avoid doing that.)

If this functionality were available, then when the trees are being constructed and when subsequent data are put through the forest, one would also specify an argument for the use of surrogate rules, just like in rpart.

I realize this question is very specific to randomForest, as opposed to R in general, but any comments are appreciated. I suppose I am looking for someone to say "It's not appropriate, and here's why ..." or "Good idea. Please implement and post your code."

Thanks,

Darin England, Senior Scientist
Ingenix



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri Jan 05 08:49:12 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 04 Jan 2007 - 22:30:24 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.