From: Betty Health <betty.health_at_gmail.com>

Date: Tue 30 Jan 2007 - 04:59:24 GMT

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue Jan 30 16:06:50 2007

Date: Tue 30 Jan 2007 - 04:59:24 GMT

Hi Weiwei, thanks a lot for the detailed help!! I tried the option 2 in R. It works pretty well! You mention that you also implemented RF. Could you plz share your code with me? Thanks!

Betty

On 1/29/07, Weiwei Shi <helprhelp@gmail.com> wrote:

*>
*

> Hi, Betty:

*>
**> 1. Fortan code (
**> http://www.stat.berkeley.edu/~breiman/RandomForests/cc_examples/prog.f)
**>
**> if(jclasswt.eq.0) then
**> do j=1,nclass
**> classwt(j)=1
**> enddo
**> endif
**> if(jclasswt.eq.1) then
**> c fill in classwt(j) for each j:
**> c classwt(1)=1.
**> c classwt(2)=10.
**>
**> You need to set the jclasswt = 1 ( you can find by "search" through the
**> codes).
**> then "uncomment" the last two lines. Here you go with classwt in
**> fortran. You can use this classwt for extremely-imbalanced
**> classification problem. Down-sampling is one possible choice for that
**> too but it is not directly implemented in rf. Check the following
**> paper, and it might help.
**> http://oz.berkeley.edu/users/chenchao/666.pdf
**>
**> 2. as to the wrapper function, the idea is that you can create a set
**> of samples by applying some sampling probilities to implement
**> down-sampling. Then build a rf model for each sample;
**> suppose you call rf in this way for each sample,
**> my.rf <- randomForest(...)
**>
**> then you can access the oob scores and prediction scores by
**> my.rf$votes or my.rf$test$votes respectively.
**>
**> then you can average those scores by yourself, it is just like a
**> simple meta-learning process but it does exactly what downsampling
**> plus rf does, though downsampling is not implemented.
**>
**>
**> 3. classwt and cutoff are used at different places. The former is used
**> at two places: calculating the gini criteria and calculating the final
**> vote from the leaf. While cutoff is only used in the final voting. So
**> cutoff won't change the splitting while classwt can. However, since
**> the current R's rf cannot do classwt, you can try to use cutoff to see
**> if it helps in your case.
**>
**> 4. The fourth option is you can use my implementation of rf; But I did
**> not write a manual for that; and it cannot show your splitting yet.
**>
**> HTH,
**>
**> weiwei
**>
**>
**>
**>
**> On 1/29/07, Betty Health <betty.health@gmail.com> wrote:
**> > Thank you very much, Weiwei and Jim!
**> >
**> > Yeah, I did read the post by Andy, the contributor of this package. It
**> seems
**> > that classwt is not implemented yet. For Weiwei's options, I have a few
**> more
**> > questions. Thanks!
**> >
**> > "1. try to use rf in fortran by following the linky below
**> > http://www.stat.berkeley.edu/~breiman/RandomForests/cc_software.htm"
**> >
**> > I read the Fortran code briefly. But I did not find the options for down
**> > sampling. So does that mean I need to do down sampling myself? Could
**> you
**> > explain a little more about "2. make a wrapper function to do the down
**> > sampling by yourself"? You mean I can do it in R or in Fortran? Some
**> links
**> > plz? I haven't done this before.
**> >
**> > Yeah, cut off did change for the final classification results. However
**> from
**> > what I tried, they did not influence how the nodes are split. So I would
**> go
**> > further in the above 2 options.
**> >
**> > Thank you again!
**> >
**> > Betty
**> >
**> >
**> >
**> >
**> > On 1/28/07, Weiwei Shi <helprhelp@gmail.com> wrote:
**> > > Dear Betty:
**> > >
**> > > I could suggest 3 options:
**> > >
**> > > 1. try to use rf in fortran by following the linky below
**> > >
**> > http://www.stat.berkeley.edu/~breiman/RandomForests/cc_software.htm
**> > >
**> > > 2. make a wrapper function to do the down sampling by yourself
**> > >
**> > > 3. try to use cutoff in randomForest, which might help in your
**> situation.
**> > >
**> > > HTH,
**> > >
**> > > weiwei
**> > >
**> > > On 1/28/07, Betty Health < betty.health@gmail.com> wrote:
**> > > > Hello there,
**> > > >
**> > > > I am working on an extremely unbalanced two class classification
**> > problems. I
**> > > > wanna use "classwt" with "down sampling" together. By checking the
**> > rfNews()
**> > > > in R, it looks that classwt is not working yet. Then I looked at the
**> > > > software from Salford. I did not find the down sampling option. I
**> am
**> > > > wondering if you have any experience to deal with this problem. Do
**> you
**> > know
**> > > > any method or softwares can handle this problem?
**> > > >
**> > > > Thank you very much!!
**> > > >
**> > > > Betty
**> > > >
**> > > > [[alternative HTML version deleted]]
**> > > >
**> > > > ______________________________________________
**> > > > R-help@stat.math.ethz.ch mailing list
**> > > > https://stat.ethz.ch/mailman/listinfo/r-help
**> > > > PLEASE do read the posting guide
**> > http://www.R-project.org/posting-guide.html
**> > > > and provide commented, minimal, self-contained, reproducible code.
**> > > >
**> > >
**> > >
**> > > --
**> > > Weiwei Shi, Ph.D
**> > > Research Scientist
**> > > GeneGO, Inc.
**> > >
**> > > "Did you always know?"
**> > > "No, I did not. But I believed..."
**> > > ---Matrix III
**> > >
**> >
**> >
**>
**>
**> --
**> Weiwei Shi, Ph.D
**> Research Scientist
**> GeneGO, Inc.
**>
**> "Did you always know?"
**> "No, I did not. But I believed..."
**> ---Matrix III
**>
*

[[alternative HTML version deleted]]

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue Jan 30 16:06:50 2007

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Tue 30 Jan 2007 - 06:30:25 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*