Re: [R] help with RandomForest classwt option

From: Weiwei Shi <helprhelp_at_gmail.com>
Date: Tue 30 Jan 2007 - 00:47:33 GMT

Hi, Betty:

  1. Fortan code (http://www.stat.berkeley.edu/~breiman/RandomForests/cc_examples/prog.f)
	if(jclasswt.eq.0) then
		do j=1,nclass
			classwt(j)=1
		enddo
	endif
	if(jclasswt.eq.1) then
c		fill in classwt(j) for each j:
c		classwt(1)=1.
c		classwt(2)=10.

You need to set the jclasswt = 1 ( you can find by "search" through the codes). then "uncomment" the last two lines. Here you go with classwt in fortran. You can use this classwt for extremely-imbalanced classification problem. Down-sampling is one possible choice for that too but it is not directly implemented in rf. Check the following paper, and it might help.
http://oz.berkeley.edu/users/chenchao/666.pdf

2. as to the wrapper function, the idea is that you can create a set of samples by applying some sampling probilities to implement down-sampling. Then build a rf model for each sample; suppose you call rf in this way for each sample, my.rf <- randomForest(...)

then you can access the oob scores and prediction scores by my.rf$votes or my.rf$test$votes respectively.

then you can average those scores by yourself, it is just like a simple meta-learning process but it does exactly what downsampling plus rf does, though downsampling is not implemented.

3. classwt and cutoff are used at different places. The former is used at two places: calculating the gini criteria and calculating the final vote from the leaf. While cutoff is only used in the final voting. So cutoff won't change the splitting while classwt can. However, since the current R's rf cannot do classwt, you can try to use cutoff to see if it helps in your case.

4. The fourth option is you can use my implementation of rf; But I did not write a manual for that; and it cannot show your splitting yet.

HTH, weiwei

On 1/29/07, Betty Health <betty.health@gmail.com> wrote:
> Thank you very much, Weiwei and Jim!
>
> Yeah, I did read the post by Andy, the contributor of this package. It seems
> that classwt is not implemented yet. For Weiwei's options, I have a few more
> questions. Thanks!
>
> "1. try to use rf in fortran by following the linky below
> http://www.stat.berkeley.edu/~breiman/RandomForests/cc_software.htm"
>
> I read the Fortran code briefly. But I did not find the options for down
> sampling. So does that mean I need to do down sampling myself? Could you
> explain a little more about "2. make a wrapper function to do the down
> sampling by yourself"? You mean I can do it in R or in Fortran? Some links
> plz? I haven't done this before.
>
> Yeah, cut off did change for the final classification results. However from
> what I tried, they did not influence how the nodes are split. So I would go
> further in the above 2 options.
>
> Thank you again!
>
> Betty
>
>
>
>
> On 1/28/07, Weiwei Shi <helprhelp@gmail.com> wrote:
> > Dear Betty:
> >
> > I could suggest 3 options:
> >
> > 1. try to use rf in fortran by following the linky below
> >
> http://www.stat.berkeley.edu/~breiman/RandomForests/cc_software.htm
> >
> > 2. make a wrapper function to do the down sampling by yourself
> >
> > 3. try to use cutoff in randomForest, which might help in your situation.
> >
> > HTH,
> >
> > weiwei
> >
> > On 1/28/07, Betty Health < betty.health@gmail.com> wrote:
> > > Hello there,
> > >
> > > I am working on an extremely unbalanced two class classification
> problems. I
> > > wanna use "classwt" with "down sampling" together. By checking the
> rfNews()
> > > in R, it looks that classwt is not working yet. Then I looked at the
> > > software from Salford. I did not find the down sampling option. I am
> > > wondering if you have any experience to deal with this problem. Do you
> know
> > > any method or softwares can handle this problem?
> > >
> > > Thank you very much!!
> > >
> > > Betty
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help@stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >
> > --
> > Weiwei Shi, Ph.D
> > Research Scientist
> > GeneGO, Inc.
> >
> > "Did you always know?"
> > "No, I did not. But I believed..."
> > ---Matrix III
> >
>
>

-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Tue Jan 30 11:54:16 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 30 Jan 2007 - 05:30:26 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.