[R] imbalanced classes

From: Mark D'Ascenzo <mdd9_at_cornell.edu>
Date: Thu 26 Jan 2006 - 10:56:24 EST


Hi Andy,

I know this topic has been discussed before on the R-help, but I was wondering if you could offer some advice specific to my application.

I'm using the R random forest package to compare two classes of data, the number of cases in each class relatively low, 28 in class 1 and 9 in class 2. I'd really like to use R environment to analyze this data, however I'm finding it difficult to put much trust in the results of my analysis. As you've stated, the classwt variables do not do much, and I've tried working with the cuttoff and sampsize variables as well, with limited success in balancing error rates between the two classes.

It was unclear to me how to use the cuttoff parameter correctly. If you have any recommendations here, it would be appreciated. Additionally with the sampsize variable, I have tried a few values, for example setting sampsize = c(2, 6) and c(9, 3), etc. It wasn't clear to me if I should be sampling more from the larger class or the other way around.

Lastly, I'm wondering if you are currently working or have plans to release in the near future an R version of randomForest that is equivalent to the FORTRAN rf5 package. It works wonderfully for my application, but getting data in and out of it, changing parameters, compiling is just a pain, as I'm sure you agree.

Your thoughts would be greatly appreciated.

Kind regards,

Mark D'Ascenzo
Biomedical Engineering
Cornell University
Ithaca, NY 14853



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Jan 26 11:03:44 2006

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:42:10 EST