[R] help with RPART

From: Terry Therneau <therneau_at_mayo.edu>
Date: Mon, 02 Jun 2008 10:30:59 -0500 (CDT)


  When using anova method, all of the printed results are scaled by the RSS for the top node. Therefore the relative error measures for the trees already are 1-R^2.   

    tfit <- rpart(time ~ ., lung)
    summary(tfit)

          CP nsplit rel error   xerror      xstd 
1 0.03665178      0 1.0000000 1.010097 0.1136942
2 0.03310179      1 0.9633482 1.079216 0.1172675
3 0.03029365      2 0.9302464 1.109587 0.1173583
4 0.01963453      3 0.8999528 1.249586 0.1327888
5 0.01627146     11 0.7396726 1.238411 0.1310952
6 0.01507635     12 0.7234012 1.260919 0.1337384
7 0.01031566     13 0.7083248 1.282740 0.1399397
8 0.01000000     14 0.6980091 1.296213 0.1396711

Node number 1: 228 observations, complexity param=0.03665178   mean=305.2325, MSE=44176.93
  left son=2 (81 obs) right son=3 (147 obs)   Primary splits:

      pat.karno < 75    to the left,  improve=0.03661157, (3 missing)
      ph.ecog   < 1.5   to the right, improve=0.03620793, (1 missing)
      status    < 1.5   to the right, improve=0.02930372, (0 missing)
      ph.karno  < 85    to the left,  improve=0.02058114, (1 missing)
      sex       < 1.5   to the left,  improve=0.01679999, (0 missing)
  Surrogate splits:
      ph.ecog  < 1.5   to the right, agree=0.787, adj=0.392, (3 split)
      ph.karno < 75    to the left,  agree=0.751, adj=0.291, (0 split)
      age      < 72.5  to the right, agree=0.680, adj=0.089, (0 split)

Node number 2: 81 observations, complexity param=0.03310179   mean=251.0247, MSE=34100.99
  left son=4 (59 obs) right son=5 (22 obs)   Primary splits:

      wt.loss < 21    to the left,  improve=0.12735970, (7 missing)
      status  < 1.5   to the right, improve=0.08060663, (0 missing)
      age     < 68.5  to the right, improve=0.04906869, (0 missing)
      inst    < 2.5   to the left,  improve=0.04148716, (0 missing)
      sex     < 1.5   to the left,  improve=0.02401074, (0 missing)
  Surrogate splits:
      ph.karno < 55    to the right, agree=0.743, adj=0.095, (6 split)

etc,

  The first split has R^2 = .0367 = 1-overall fit (top few lines) = the improvement measure for the node.    

   The second split has R^2 = .127 for the obs within that node, it improve the R^2 for the model as a whole by .033.    

           Terry T.



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 02 Jun 2008 - 17:34:31 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 02 Jun 2008 - 18:30:40 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive