Re: [R] [handling] Missing [values in randomForest]

From: Kevin Bartz <>
Date: Tue 13 Sep 2005 - 09:17:20 EST

Hi Jan-Paul,

You definitely want to be careful with na.omit in randomForest -- that wipes out any row with even one NA. If NAs are sprawled throughout your dataset, na.omit might end up killing a lot of rows. Here's my usual MO for missing values:

  1. "impute" in Hmisc fills in gaps with the mean, median, most common value, etc.
  2. rfImpute: fits a forest on the rows available and uses it to predict the missing values.
  3. aregImpute: similar to rfImpute, but using a linear model.
  4. You may want to consider using a single tree ("rpart" package) in this case instead of a forest. Single trees deal with missing values cleanly through surrogate splits.

Good luck!


-----Original Message-----
[] On Behalf Of Uwe Ligges Sent: Sunday, September 11, 2005 3:44 AM To: Jan-Paul Roodbol
Subject: Re: [R] [handling] Missing [values in randomForest]

Jan-Paul Roodbol wrote:

> Does anyone know if randomForest in R can handle
> dataset with missings?

See ?randomForest, you can omit observations including NAs by specifying


Please do not cross-post!
Please specify a sensible subject!

Uwe Ligges

> Thank you
> Kind regards
> Jan-Paul
> ______________________________________________
> mailing list
> PLEASE do read the posting guide! mailing list PLEASE do read the posting guide! mailing list PLEASE do read the posting guide! Received on Tue Sep 13 09:27:56 2005

This archive was generated by hypermail 2.1.8 : Sun 23 Oct 2005 - 16:57:12 EST