RE: [R] Random Forest with highly imbalanced data

About this list Date view Thread view Subject view Author view Attachment view

From: Liaw, Andy (andy_liaw@merck.com)
Date: Thu 13 May 2004 - 05:54:26 EST


Message-id: <3A822319EB35174CA3714066D590DCD504AF7D86@usrymx25.merck.com>

Breiman & Cutler's version 5 of the Fortran code implements a weighting
scheme that is more effective than the old classwt. Basically:

1. class weights are used in computing the Gini index.
2. At terminal nodes, weighted votes are taken to determine the prediction
for the node.
3. Average weights within terminal nodes are computed, and used as weights
for the final weighted vote.

This has not been implemented in the R version of the package (and is one of
the reasons the version number for the package is still 4.x-y instead of
5.x-y). Do note that one usually needs to `tune' the class weights a bit to
get the desired result.

The current version of the R package does offer the sampsize option; i.e.,
randomForest(..., sampsize=c(100, 100), ...) will draw 100 cases within each
class, with replacement, to grow each tree. (This is the `down-sampling'
approach.) We have found this to work quite well in general.

[Advertisement: I will present both at the Interface in a few weeks.]

Best,
Andy

> From: Kel
>
> Hi group,
>
> I am trying to do a RF with approx 250,000
> cases. My objective is to determine the risk factors
> of a person being readmitted to hospital (response=1)
> or else (response=0). Only 10%, or 25,000 cases were
> readmitted. I've heard about down-sampling and class
> weight approach and am wondering if R can do it. Even
> some reference to articles will help.
>
> >From the statistical point of view, is there any rule
> of thumb of the positive/negative response ratio so
> that adjustment has to be applied?
>
> Thank you so much.
>
> Regards,
> Kelvin
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.3 : Mon 31 May 2004 - 23:05:09 EST