Re: [R] help with RandomForest classwt option

From: Jim Porzak <jporzak_at_gmail.com>
Date: Mon 29 Jan 2007 - 04:19:43 GMT

See Andy's previous post on this.

-- 
HTH,
Jim Porzak
Loyalty Matrix Inc.
San Francisco, CA


===============
Liaw, Andy <andy_liaw@merck.com> 	Thu, Oct 27, 2005 at 8:37 AM
To: "David L. Van Brunt, Ph.D." <dlvanbrunt@gmail.com>, Gabor
Grothendieck <ggrothendieck@gmail.com>
Cc: r-help@stat.math.ethz.ch
"classwt" in the current version of the randomForest package doesn't work
too well.  (It's what was in version 3.x of the original Fortran code by
Breiman and Cutler, not the one in the new Fortran code.)  I'd advise
against using it.

"sampsize" and "strata" can be use in conjunction.  If "strata" is not
specified, the class labels will be used.  Take the iris data as an example:

randomForest(Species ~ ., iris, sampsize=c(10, 30, 10))

says to randomly draw 10, 30 and 10 from the three species (with
replacement) to grow each tree.  If you are unsure of the labels, use named
vector, e.g.,

randomForest(Species ~ ., iris,
            sampsize=c(setosa=10, versicolor=30, virginica=10))

Now, if you want the stratified sampling to be done using a different
variable than the class labels; e.g., for multi-centered clinical trial
data, you want to draw the same number of patients per center to grow each
tree (I'm just making things up, not that that necessarily makes any sense),
you can do something like:

randomForest(..., strata=center,
            sampsize=rep(min(table(center))), nlevels(center)))

which draws the same number of patients (minimum at any center) from each
center to grow each tree.

Hope that's clear.  Eventually all such things will be in the yet to be
written package vignette...

Andy


On 1/28/07, Betty Health <betty.health@gmail.com> wrote:

> Hello there,
>
> I am working on an extremely unbalanced two class classification problems. I
> wanna use "classwt" with "down sampling" together. By checking the rfNews()
> in R, it looks that classwt is not working yet. Then I looked at the
> software from Salford. I did not find the down sampling option. I am
> wondering if you have any experience to deal with this problem. Do you know
> any method or softwares can handle this problem?
>
> Thank you very much!!
>
> Betty
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Received on Mon Jan 29 15:27:08 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Mon 29 Jan 2007 - 05:30:26 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.