Re: [R] Repost: Examples of "classwt", "strata", and "sampsize" i n randomForest?

From: David L. Van Brunt, Ph.D. <dlvanbrunt_at_gmail.com>
Date: Fri 28 Oct 2005 - 03:37:04 EST

Perfect! More useful than I was even hoping for. Great help, many thanks!

On 10/27/05, Liaw, Andy <andy_liaw@merck.com> wrote:
>
> "classwt" in the current version of the randomForest package doesn't work
> too well. (It's what was in version 3.x of the original Fortran code by
> Breiman and Cutler, not the one in the new Fortran code.) I'd advise
> against using it.
>
> "sampsize" and "strata" can be use in conjunction. If "strata" is not
> specified, the class labels will be used. Take the iris data as an
> example:
>
> randomForest(Species ~ ., iris, sampsize=c(10, 30, 10))
>
> says to randomly draw 10, 30 and 10 from the three species (with
> replacement) to grow each tree. If you are unsure of the labels, use named
> vector, e.g.,
>
> randomForest(Species ~ ., iris,
> sampsize=c(setosa=10, versicolor=30, virginica=10))
>
> Now, if you want the stratified sampling to be done using a different
> variable than the class labels; e.g., for multi-centered clinical trial
> data, you want to draw the same number of patients per center to grow each
> tree (I'm just making things up, not that that necessarily makes any
> sense),
> you can do something like:
>
> randomForest(..., strata=center,
> sampsize=rep(min(table(center))), nlevels(center)))
>
> which draws the same number of patients (minimum at any center) from each
> center to grow each tree.
>
> Hope that's clear. Eventually all such things will be in the yet to be
> written package vignette...
>
> Andy
>
>
> > From: David L. Van Brunt, Ph.D.
> >
> > I have read both the help files and that article... the
> > article very nicely
> > evaluates the value of dealing with unbalanced data, and the
> > help files show
> > that you can, but offer no guidance in terms of how the
> > syntax should be
> > specified. The "strata" and "classwt" clearly can be
> > specified, but it's not
> > shown how to specify the values...
> >
> > The examples do not include specifications of those terms,
> > and every guess
> > I've made has generated an error....
> >
> >
> > On 10/27/05, Gabor Grothendieck <ggrothendieck@gmail.com> wrote:
> > >
> > > See
> > >
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/40898.html
> > >
> > > On 10/27/05, David L. Van Brunt, Ph.D. <dlvanbrunt@gmail.com> wrote:
> > > > Sorry for the repost, but I've really been looking, and
> > can't find any
> > > > syntax direction on this issue...
> > > >
> > > > Just browsing the documentation, and searching the list
> > came up short...
> > > I
> > > > have some unbalanced data and was wondering if, in a "0" v "1"
> > > > classification forest, some combo of these options might
> > yield better
> > > > predictions when the proportion of one class is low (less
> > than 10% in a
> > > > sample of 2,000 observations).
> > > >
> > > > Not sure how to specify these terms... from the docs, we have:
> > > >
> > > > classwt: Priors of the classes. Need not add up to one.
> > Ignored for
> > > > regression.
> > > >
> > > > So is this something like "... classwt=c(.90,.10)" ? I
> > didn't see the
> > > syntax
> > > > demonstrated. Similar for "strata" and "sampsize" though
> > there is a
> > > default
> > > > for sampsize that makes sense... not sure how you would
> > make "a vector
> > > of
> > > > the length the number of strata", however....
> > > >
> > > > Pointers?
> > > >
> > > > --
> > > > ---------------------------------------
> > > > David L. Van Brunt, Ph.D.
> > > > mailto:dlvanbrunt@gmail.com
> > > >
> > > > --
> > > > ---------------------------------------
> > > > David L. Van Brunt, Ph.D.
> > > > mailto:dlvanbrunt@gmail.com
> > > >
> > > > [[alternative HTML version deleted]]
> > > >
> > > > ______________________________________________
> > > > R-help@stat.math.ethz.ch mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide!
> > > http://www.R-project.org/posting-guide.html
> > > >
> > >
> >
> >
> >
> > --
> > ---------------------------------------
> > David L. Van Brunt, Ph.D.
> > mailto:dlvanbrunt@gmail.com
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> >
> >
>
>
>
> ------------------------------------------------------------------------------
> Notice: This e-mail message, together with any attachments...{{dropped}}



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Oct 28 04:24:21 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:40:50 EST