Re: [R] help with RandomForest classwt option

From: Betty Health <betty.health_at_gmail.com>
Date: Tue 30 Jan 2007 - 04:59:24 GMT

Hi Weiwei, thanks a lot for the detailed help!! I tried the option 2 in R. It works pretty well! You mention that you also implemented RF. Could you plz share your code with me? Thanks!

Betty

On 1/29/07, Weiwei Shi <helprhelp@gmail.com> wrote:
>
> Hi, Betty:
>
> 1. Fortan code (
> http://www.stat.berkeley.edu/~breiman/RandomForests/cc_examples/prog.f)
>
> if(jclasswt.eq.0) then
> do j=1,nclass
> classwt(j)=1
> enddo
> endif
> if(jclasswt.eq.1) then
> c fill in classwt(j) for each j:
> c classwt(1)=1.
> c classwt(2)=10.
>
> You need to set the jclasswt = 1 ( you can find by "search" through the
> codes).
> then "uncomment" the last two lines. Here you go with classwt in
> fortran. You can use this classwt for extremely-imbalanced
> classification problem. Down-sampling is one possible choice for that
> too but it is not directly implemented in rf. Check the following
> paper, and it might help.
> http://oz.berkeley.edu/users/chenchao/666.pdf
>
> 2. as to the wrapper function, the idea is that you can create a set
> of samples by applying some sampling probilities to implement
> down-sampling. Then build a rf model for each sample;
> suppose you call rf in this way for each sample,
> my.rf <- randomForest(...)
>
> then you can access the oob scores and prediction scores by
> my.rf$votes or my.rf$test$votes respectively.
>
> then you can average those scores by yourself, it is just like a
> simple meta-learning process but it does exactly what downsampling
> plus rf does, though downsampling is not implemented.
>
>
> 3. classwt and cutoff are used at different places. The former is used
> at two places: calculating the gini criteria and calculating the final
> vote from the leaf. While cutoff is only used in the final voting. So
> cutoff won't change the splitting while classwt can. However, since
> the current R's rf cannot do classwt, you can try to use cutoff to see
> if it helps in your case.
>
> 4. The fourth option is you can use my implementation of rf; But I did
> not write a manual for that; and it cannot show your splitting yet.
>
> HTH,
>
> weiwei
>
>
>
>
> On 1/29/07, Betty Health <betty.health@gmail.com> wrote:
> > Thank you very much, Weiwei and Jim!
> >
> > Yeah, I did read the post by Andy, the contributor of this package. It
> seems
> > that classwt is not implemented yet. For Weiwei's options, I have a few
> more
> > questions. Thanks!
> >
> > "1. try to use rf in fortran by following the linky below
> > http://www.stat.berkeley.edu/~breiman/RandomForests/cc_software.htm"
> >
> > I read the Fortran code briefly. But I did not find the options for down
> > sampling. So does that mean I need to do down sampling myself? Could
> you
> > explain a little more about "2. make a wrapper function to do the down
> > sampling by yourself"? You mean I can do it in R or in Fortran? Some
> links
> > plz? I haven't done this before.
> >
> > Yeah, cut off did change for the final classification results. However
> from
> > what I tried, they did not influence how the nodes are split. So I would
> go
> > further in the above 2 options.
> >
> > Thank you again!
> >
> > Betty
> >
> >
> >
> >
> > On 1/28/07, Weiwei Shi <helprhelp@gmail.com> wrote:
> > > Dear Betty:
> > >
> > > I could suggest 3 options:
> > >
> > > 1. try to use rf in fortran by following the linky below
> > >
> > http://www.stat.berkeley.edu/~breiman/RandomForests/cc_software.htm
> > >
> > > 2. make a wrapper function to do the down sampling by yourself
> > >
> > > 3. try to use cutoff in randomForest, which might help in your
> situation.
> > >
> > > HTH,
> > >
> > > weiwei
> > >
> > > On 1/28/07, Betty Health < betty.health@gmail.com> wrote:
> > > > Hello there,
> > > >
> > > > I am working on an extremely unbalanced two class classification
> > problems. I
> > > > wanna use "classwt" with "down sampling" together. By checking the
> > rfNews()
> > > > in R, it looks that classwt is not working yet. Then I looked at the
> > > > software from Salford. I did not find the down sampling option. I
> am
> > > > wondering if you have any experience to deal with this problem. Do
> you
> > know
> > > > any method or softwares can handle this problem?
> > > >
> > > > Thank you very much!!
> > > >
> > > > Betty
> > > >
> > > > [[alternative HTML version deleted]]
> > > >
> > > > ______________________________________________
> > > > R-help@stat.math.ethz.ch mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.
> > > >
> > >
> > >
> > > --
> > > Weiwei Shi, Ph.D
> > > Research Scientist
> > > GeneGO, Inc.
> > >
> > > "Did you always know?"
> > > "No, I did not. But I believed..."
> > > ---Matrix III
> > >
> >
> >
>
>
> --
> Weiwei Shi, Ph.D
> Research Scientist
> GeneGO, Inc.
>
> "Did you always know?"
> "No, I did not. But I believed..."
> ---Matrix III
>

        [[alternative HTML version deleted]]



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue Jan 30 16:06:50 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 30 Jan 2007 - 06:30:25 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.