Date: Tue 30 Jan 2007 - 04:59:24 GMT

Hi Weiwei, thanks a lot for the detailed help!! I tried the option 2 in R. It works pretty well! You mention that you also implemented RF. Could you plz share your code with me? Thanks!

Betty

On 1/29/07, Weiwei Shi <helprhelp@gmail.com> wrote:

> Hi, Betty:

> 1. Fortan code (
> http://www.stat.berkeley.edu/~breiman/RandomForests/cc_examples/prog.f)
**> if(jclasswt.eq.0) then
**> do j=1,nclass
**> classwt(j)=1
**> enddo
**> endif
**> if(jclasswt.eq.1) then
**> c fill in classwt(j) for each j:
**> c classwt(1)=1.
**> c classwt(2)=10.
**> You need to set the jclasswt = 1 ( you can find by "search" through the
**> codes).
**> then "uncomment" the last two lines. Here you go with classwt in
**> fortran. You can use this classwt for extremely-imbalanced
**> classification problem. Down-sampling is one possible choice for that
**> too but it is not directly implemented in rf. Check the following
**> paper, and it might help.
**> http://oz.berkeley.edu/users/chenchao/666.pdf
**> 2. as to the wrapper function, the idea is that you can create a set
**> of samples by applying some sampling probilities to implement
**> down-sampling. Then build a rf model for each sample;
**> suppose you call rf in this way for each sample,
**> my.rf <- randomForest(...)
**> then you can access the oob scores and prediction scores by
**> my.rf$votes or my.rf$test$votes respectively.
**>
**> then you can average those scores by yourself, it is just like a
**> simple meta-learning process but it does exactly what downsampling
**> plus rf does, though downsampling is not implemented.
**> 3. classwt and cutoff are used at different places. The former is used
**> at two places: calculating the gini criteria and calculating the final
**> vote from the leaf. While cutoff is only used in the final voting. So
**> cutoff won't change the splitting while classwt can. However, since
**> the current R's rf cannot do classwt, you can try to use cutoff to see
**> if it helps in your case.
**> 4. The fourth option is you can use my implementation of rf; But I did
**> not write a manual for that; and it cannot show your splitting yet.
> HTH,
> weiwei
**> On 1/29/07, Betty Health <betty.health@gmail.com> wrote:
**> > Thank you very much, Weiwei and Jim!
**> >
**> > Yeah, I did read the post by Andy, the contributor of this package. It
**> seems
**> > that classwt is not implemented yet. For Weiwei's options, I have a few
**> more
**> > questions. Thanks!
**> >
**> > "1. try to use rf in fortran by following the linky below
**> > http://www.stat.berkeley.edu/~breiman/RandomForests/cc_software.htm"
**> >
**> > I read the Fortran code briefly. But I did not find the options for down
**> > sampling. So does that mean I need to do down sampling myself? Could
**> you
**> > explain a little more about "2. make a wrapper function to do the down
**> > sampling by yourself"? You mean I can do it in R or in Fortran? Some
**> links
**> > plz? I haven't done this before.
**> >
**> > Yeah, cut off did change for the final classification results. However
**> from
**> > what I tried, they did not influence how the nodes are split. So I would
**> go
**> > further in the above 2 options.
**> >
**> > Thank you again!
**> >
**> > Betty
**> >
**> > On 1/28/07, Weiwei Shi <helprhelp@gmail.com> wrote:
**> > > Dear Betty:
**> > >
**> > > I could suggest 3 options:
**> > >
**> > > 1. try to use rf in fortran by following the linky below
**> > >
**> > http://www.stat.berkeley.edu/~breiman/RandomForests/cc_software.htm
**> > >
> > > 2. make a wrapper function to do the down sampling by yourself
**> > >
**> > > 3. try to use cutoff in randomForest, which might help in your
**> situation.
**> > >
**> > > HTH,
**> > >
**> > > weiwei
**> > >
**> > > On 1/28/07, Betty Health < betty.health@gmail.com> wrote:
**> > > > Hello there,
**> > > >
**> > > > I am working on an extremely unbalanced two class classification
**> > problems. I
**> > > > wanna use "classwt" with "down sampling" together. By checking the
**> > rfNews()
**> > > > in R, it looks that classwt is not working yet. Then I looked at the
**> > > > software from Salford. I did not find the down sampling option. I
**> am
**> > > > wondering if you have any experience to deal with this problem. Do
**> you
**> > know
**> > > > any method or softwares can handle this problem?
**> > > >
**> > > > Thank you very much!!
**> > > >
**> > > > Betty
**> > > >
**> > > >
**> > > --
**> > > Weiwei Shi, Ph.D
**> > > Research Scientist
**> > > GeneGO, Inc.
**> > >
**> > > "Did you always know?"
**> > > "No, I did not. But I believed..."
**> > > ---Matrix III
**> > >
**> --
**> Weiwei Shi, Ph.D
**> Research Scientist
**> GeneGO, Inc.
**>
**> "Did you always know?"
**> "No, I did not. But I believed..."
**> ---Matrix III
