Re: [R] Re-evaluating the tree in the random forest

From: Liaw, Andy <andy_liaw_at_merck.com>
Date: Fri 09 Sep 2005 - 23:39:49 EST


Here's an example, using the iris data:

> ## Grow one tree, using all data, and try all variables at all splits,
> ## using large nodesize to get smaller tree.
> iris.rf <- randomForest(iris[-5], iris[[5]], ntree=1, nodesize=20, mtry=4,

+                         sampsize=150, replace=FALSE)

> getTree(iris.rf, 1)

   left daughter right daughter split var split point status prediction

1              2              3         3        2.45      1          0
2              0              0         0        0.00     -1          1
3              4              5         4        1.75      1          0
4              6              7         3        4.95      1          0
5              8              9         3        4.85      1          0
6             10             11         4        1.65      1          0
7              0              0         0        0.00     -1          3
8              0              0         0        0.00     -1          3
9              0              0         0        0.00     -1          3
10             0              0         0        0.00     -1          2
11             0              0         0        0.00     -1          3

> idx <- with(iris, Petal.Length > 2.45 & Petal.Length < 3.5)
> predict(iris.rf, iris[idx, -5])

[1] versicolor versicolor versicolor
Levels: setosa versicolor virginica
> iris.rf$forest$xbestsplit[1,1] <- 3.5
> predict(iris.rf, iris[newiris, -5])

[1] setosa setosa setosa
Levels: setosa versicolor virginica

Note how the predictions have changed.

HTH,
Andy

> -----Original Message-----
> From: Martin Lam [mailto:tmlammail@yahoo.com]
> Sent: Friday, September 09, 2005 9:04 AM
> To: Liaw, Andy; r-help@stat.math.ethz.ch
> Subject: RE: [R] Re-evaluating the tree in the random forest
>
>
> Hi,
>
> Let me give a simple example, assume a dataset
> containing 5 instances with 1 variable and the class
> label:
>
> [x1, y]:
> [0.5, A]
> [3.2, B]
> [4.5, B]
> [1.4, C]
> [1.6, C]
> [1.9, C]
>
> Assume that the randomForest algorithm create this (2
> levels deep) tree:
>
> Root node: question: x1 < 2.2?
>
> Left terminal node:
> [0.5, A]
> [1.4, C]
> [1.6, C]
> [1.9, C]
> Leaf classification: C
>
> Right terminal node:
> [3.2, B]
> [4.5, B]
> Leaf classification: B
>
> If I change the question at the root node to "x1 <
> 1?", the instances in the left leaf node are not
> correctly passed down the tree anymore.
> My original question was if there was a way to
> re-evaluate the instances again into:
>
> Root node: question: x1 < 1?
>
> Left terminal node:
> [0.5, A]
> Leaf classification: A
>
> Right terminal node:
> [3.2, B]
> [4.5, B]
> [1.4, C]
> [1.6, C]
> [1.9, C]
> Leaf classification: C
>
> Cheers,
>
> Martin
>
> --- "Liaw, Andy" <andy_liaw@merck.com> wrote:
>
> > > From: Martin Lam
> > >
> > > Dear mailinglist members,
> > >
> > > I was wondering if there was a way to re-evaluate
> > the
> > > instances of a tree (in the forest) again after I
> > have
> > > manually changed a splitpoint (or split variable)
> > of a
> > > decision node. Here's an illustration:
> > >
> > > library("randomForest")
> > >
> > > forest.rf <- randomForest(formula = Species ~ .,
> > data
> > > = iris, do.trace = TRUE, ntree = 3, mtry = 2,
> > > norm.votes = FALSE)
> > >
> > > # I am going to change the splitpoint of the root
> > node
> > > of the first tree to 1
> > > forest.rf$forest$xbestsplit[1,]
> > > forest.rf$forest$xbestsplit[1,1] <- 1
> > > forest.rf$forest$xbestsplit[1,]
> > >
> > > Because I've changed the splitpoint, some
> > instances in
> > > the leafs are not supposed where they should be.
> > Is
> > > there a way to reappoint them to the correct leaf?
> >
> > I'm not sure what you want to do exactly, but I
> > suspect you can use
> > predict().
> >
> > > I was also wondering how I should interpret the
> > output
> > > of do.trace:
> > >
> > > ntree OOB 1 2 3
> > > 1: 3.70% 0.00% 6.25% 5.88%
> > > 2: 3.49% 0.00% 3.85% 7.14%
> > > 3: 3.57% 0.00% 5.56% 5.26%
> > >
> > > What's OOB and what does the percentages mean?
> >
> > OOB stands for `Out-of-bag'. Read up on random
> > forests (e.g., the article
> > in R News) to learn about it. Those numbers are
> > estimated error rates. The
> > `OOB' column is across all data, while the others
> > are for the classes.
> >
> > Andy
> >
> >
> > > Thanks in advance,
> > >
> > > Martin
> > >
> > >
> > >
> > >
> > >
> >
> ______________________________________________________
> > > Click here to donate to the Hurricane Katrina
> > relief effort.
> > >
> > > ______________________________________________
> > > R-help@stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide!
> > > http://www.R-project.org/posting-guide.html
> > >
> > >
> > >
> >
> >
> >
> >
> --------------------------------------------------------------
> ----------------
> > Notice: This e-mail message, together with any
> > attachments, contains information of Merck & Co.,
> > Inc. (One Merck Drive, Whitehouse Station, New
> > Jersey, USA 08889), and/or its affiliates (which may
> > be known outside the United States as Merck Frosst,
> > Merck Sharp & Dohme or MSD and in Japan, as Banyu)
> > that may be confidential, proprietary copyrighted
> > and/or legally privileged. It is intended solely for
> > the use of the individual or entity named on this
> > message. If you are not the intended recipient, and
> > have received this message in error, please notify
> > us immediately by reply e-mail and then delete it
> > from your system.
> >
> --------------------------------------------------------------
> ----------------
> >
>
>
>
>
>
> ______________________________________________________
> Click here to donate to the Hurricane Katrina relief effort.
> http://store.yahoo.com/redcross-donate3/
>
>
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Sep 09 23:48:03 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:40:09 EST