Re: [R] help with RandomForest classwt option

From: Weiwei Shi <helprhelp_at_gmail.com>
Date: Tue 30 Jan 2007 - 01:16:51 GMT

The fifth option:

actually it might be the easiest way:
you "boost" your minority by like 10 fold (just repeat each minority record 10 times). Then run rf on the boosted sample. The learning process does not exactly behave like using classwt (setting classwt[2] = 10 will exactly gives weight=10 in gini calculation), but it is statistically similar.

Be careful of oob error though. it won't give you a correct estimation of it since a sample can be used in training while its duplicates could be used in out-of-bag. But if you care about the splitting, this approach helps, IMHO.

HTH, weiwei

On 1/29/07, Betty Health <betty.health@gmail.com> wrote:
> Thank you very much, Weiwei and Jim!
>
> Yeah, I did read the post by Andy, the contributor of this package. It seems
> that classwt is not implemented yet. For Weiwei's options, I have a few more
> questions. Thanks!
>
> "1. try to use rf in fortran by following the linky below
> http://www.stat.berkeley.edu/~breiman/RandomForests/cc_software.htm"
>
> I read the Fortran code briefly. But I did not find the options for down
> sampling. So does that mean I need to do down sampling myself? Could you
> explain a little more about "2. make a wrapper function to do the down
> sampling by yourself"? You mean I can do it in R or in Fortran? Some links
> plz? I haven't done this before.
>
> Yeah, cut off did change for the final classification results. However from
> what I tried, they did not influence how the nodes are split. So I would go
> further in the above 2 options.
>
> Thank you again!
>
> Betty
>
>
>
>
> On 1/28/07, Weiwei Shi <helprhelp@gmail.com> wrote:
> > Dear Betty:
> >
> > I could suggest 3 options:
> >
> > 1. try to use rf in fortran by following the linky below
> >
> http://www.stat.berkeley.edu/~breiman/RandomForests/cc_software.htm
> >
> > 2. make a wrapper function to do the down sampling by yourself
> >
> > 3. try to use cutoff in randomForest, which might help in your situation.
> >
> > HTH,
> >
> > weiwei
> >
> > On 1/28/07, Betty Health < betty.health@gmail.com> wrote:
> > > Hello there,
> > >
> > > I am working on an extremely unbalanced two class classification
> problems. I
> > > wanna use "classwt" with "down sampling" together. By checking the
> rfNews()
> > > in R, it looks that classwt is not working yet. Then I looked at the
> > > software from Salford. I did not find the down sampling option. I am
> > > wondering if you have any experience to deal with this problem. Do you
> know
> > > any method or softwares can handle this problem?
> > >
> > > Thank you very much!!
> > >
> > > Betty
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help@stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >
> > --
> > Weiwei Shi, Ph.D
> > Research Scientist
> > GeneGO, Inc.
> >
> > "Did you always know?"
> > "No, I did not. But I believed..."
> > ---Matrix III
> >
>
>

-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Tue Jan 30 12:20:48 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 30 Jan 2007 - 04:30:25 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.