[R] Question on class 1, 2 output for RandomForest

From: Melanie Vida <mvida_at_mitre.org>
Date: Thu 24 Mar 2005 - 02:05:10 EST

Hi All,

I read the R-newsletter Volum 2/3, December 2002 on page 18. I tried the example there, too. Then, I used a different data set with random Forest from the UCI respository. The results for the "credit" data generated 2 additional columns, column "1" and a column "2" that the example given in the newsletter did not generate from the fgl data set.

For the "credit" data, what does the output with the heading "1", " 2" imply for ntree=100...500 (below)? Does the "1" imply the actual data, "class 1" and a group of synthetic data "2" -> "class 2"? Did my random forest automatically default to unsupervised learning and automatically create the class 2, synthetic data, then classify the combined data with the random Forest? If so, which method did R used to generate the synthetic data? The newsletter states that there are 2 ways to generate synthetic data.

Further, the parameters to tune these randomForest would ideally optimize the OOB error rate and whatever column 1 and 2 error rates mean? I tried mtry=2, 3 and 10, but that didn't change the errors much. Are these results reasonable, or should I tried to tune different parameters for this special case?

ntree OOB 1 2

  1. 20.72% 14.10% 28.99%
  2. 18.99% 13.58% 25.73%
  3. 19.71% 15.14% 25.41%
  4. 20.00% 14.10% 27.36%
  5. 19.13% 13.58% 26.06%

Call:
 randomForest(x = V16 ~ ., data = credit, mtry = 3, importance =

TRUE,      do.trace = 100)
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 3

        OOB estimate of error rate: 19.86% Confusion matrix:

Thanks in advance,

-Melanie



# Read in the credit table
credit =
read.table(url('
ftp://ftp.ics.uci.edu/pub/machine-learning-databases/credit-screening/crx.data'),sep=",") str(credit)
credit$V2 = as.numeric(credit$V2)
credit$V14 = as.numeric(credit$V14)
str(credit)

credit.rf <- randomForest(V16 ~ ., data=credit, mtry=3, importance = TRUE, do.trace=100)
print(credit.rf)

-Melanie



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Mar 24 02:18:27 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:30:55 EST