[R] confusion matrix in randomForest

From: Miklos Kiss <mzkiss_at_gmail.com>
Date: Sat, 19 Jul 2008 19:46:37 -0700 (PDT)

I have a question on the output generated by randomForest in classification mode, specifically, the confusion matrix. The confusion matrix lists the various classes and how the forest classified each one, plus the classification error. Are these numbers essentially averages over all the trees in the forest? If so, is there a way I can get the standard deviation values out of the randomForest, or do I have to evaluate each tree individually? By way of illustration, let me show the confusion matrix using the iris data. The output below shows that the forest correctly classified 47 versicolor irises, but this is the result for the entire forest. I'd like to know if every tree will have 47 correctly classified versicolor irises, but I don't think it will. Same for the class.error value. Not every tree will have those exact same values, right?

But this raises another question. For this example, I used the entire data set to generate the forest, and so I assume that the confusion matrix is based on OOB data, so if I created a training set and evaluated trees individually in the test set I could get averages and standard deviations on the error rate.

Any thoughts? Thanks in advance.

-Miklos Z. Kiss

> print(iris.rf)

 randomForest(formula = Species ~ ., data = iris, importance = TRUE, keep.forest = TRUE)

               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 2

        OOB estimate of  error rate: 5.33%
Confusion matrix:
           setosa versicolor virginica class.error
setosa         50          0         0        0.00
versicolor      0         47         3        0.06
virginica       0          5        45        0.10
View this message in context: http://www.nabble.com/confusion-matrix-in-randomForest-tp18550873p18550873.html
Sent from the R help mailing list archive at Nabble.com.

R-help_at_r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Sun 20 Jul 2008 - 07:04:21 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 22 Jul 2008 - 03:32:02 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive