Re: [R] confusion matrix in randomForest

From: Liaw, Andy <andy_liaw_at_merck.com>
Date: Mon, 21 Jul 2008 21:41:08 -0400

randomForest predictions are based on votes of individual trees, thus have little to do with error rates of individual trees.

Andy

> -----Original Message-----
> From: r-help-bounces_at_r-project.org
> [mailto:r-help-bounces_at_r-project.org] On Behalf Of Miklos Kiss
> Sent: Saturday, July 19, 2008 10:47 PM
> To: r-help_at_r-project.org
> Subject: [R] confusion matrix in randomForest
>
>
> I have a question on the output generated by randomForest in
> classification
> mode, specifically, the confusion matrix. The confusion
> matrix lists the
> various classes and how the forest classified each one, plus the
> classification error. Are these numbers essentially averages
> over all the
> trees in the forest? If so, is there a way I can get the
> standard deviation
> values out of the randomForest, or do I have to evaluate each tree
> individually? By way of illustration, let me show the
> confusion matrix
> using the iris data. The output below shows that the forest correctly
> classified 47 versicolor irises, but this is the result for the entire
> forest. I'd like to know if every tree will have 47
> correctly classified
> versicolor irises, but I don't think it will. Same for the
> class.error
> value. Not every tree will have those exact same values, right?
>
> But this raises another question. For this example, I used
> the entire data
> set to generate the forest, and so I assume that the
> confusion matrix is
> based on OOB data, so if I created a training set and evaluated trees
> individually in the test set I could get averages and
> standard deviations on
> the error rate.
>
> Any thoughts? Thanks in advance.
>
> -Miklos Z. Kiss
>
> > print(iris.rf)
> Call:
> randomForest(formula = Species ~ ., data = iris, importance
> = TRUE,
> keep.forest = TRUE)
> Type of random forest: classification
> Number of trees: 500
> No. of variables tried at each split: 2
>
> OOB estimate of error rate: 5.33%
> Confusion matrix:
> setosa versicolor virginica class.error
> setosa 50 0 0 0.00
> versicolor 0 47 3 0.06
> virginica 0 5 45 0.10
> --
> View this message in context:
> http://www.nabble.com/confusion-matrix-in-randomForest-tp18550
873p18550873.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Notice: This e-mail message, together with any attachme...{{dropped:12}}



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 22 Jul 2008 - 01:53:16 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 22 Jul 2008 - 04:32:10 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive