[R] Classification problem - rpart

About this list Date view Thread view Subject view Author view Attachment view

From: Andy Bunn (abunn@montana.edu)
Date: Fri 11 Apr 2003 - 03:36:04 EST


Message-id: <001201c2ff87$acaecd40$e6a00ecf@simATE>

I am performing a binary classification using a classification tree.
Ironically, the data themselves are 2483 tree (real biological ones)
locations as described by a suite of environmental variables (slope, soil
moisture, radiation load, etc). I want to separate them from an equal number
of random points. Doing eda on the data shows that there is substantial
difference between the tree and random classes, e.g., box and whisker plots
for slope show separation.

The data frame is thus:

curvegrid,dir2tl,dist2tl,slope,tasp,tci10,class
-0.000244141,266,1852.701,2.382412,0.2124468,131,random
0.3005371,246,1146.342,10.45694,0.8045813,63,random
.
.
.
.
-0.3000488,90,10,20.25561,-0.1293357,62,tree
-0.5,90,10,18.68057,-0.05228489,61,tree
-0.6994629,0,0,18.30121,0.0320744,66,tree

I've run rpart on similar data without an issue but when I try it on this
data as follows:

tree <- rpart(class ~ curvegrid + slope + tci10, method="class")

I get the following output:

> tree
n= 4966

node), split, n, loss, yval, (yprob)
      * denotes terminal node

1) root 4966 2483 dw (0.500000000 0.500000000)
  2) slope=0.3206026,0.5159777,0.679302,0.7163697,1.1324.......... 2574 94
dw (0.963480963 0.036519037) *
  3) slope=0,0.1011371,0.1013844,0.2027681,0.2267014,0.32......... MISSING
2392 3 random (0.001254181 0.998745819) *

This is not like other trees I have run!

And:

summary(tree)
> summary(tree)
Call:
rpart(formula = class ~ curvegrid + slope + tci10)
  n= 4966

         CP nsplit rel error xerror xstd
1 0.9609344 0 1.00000000 1.0322191 0.01418310
2 0.0100000 1 0.03906565 0.7635924 0.01378822

Node number 1: 4966 observations, complexity param=0.9609344
  predicted class=dw expected loss=0.5
    class counts: 2483 2483
   probabilities: 0.500 0.500
  left son=2 (2574 obs) right son=3 (2392 obs)
  Primary splits:
     slope splits as RRRRRRLRRRLRRRRLLRRRRRRR.......
     tci10 splits as RRRRRRRRRRLLRLLRLLRLLRLL.......

etc.

Node number 2: 2574 observations
  predicted class=dw expected loss=0.03651904
    class counts: 2480 94
   probabilities: 0.963 0.037

Node number 3: 2392 observations
  predicted class=random expected loss=0.001254181
    class counts: 3 2389
   probabilities: 0.001 0.999

I'm assuming that I have to adjust something in rpart.control. I am also
hesitant at posting prematurely but am in fetters.

Thanks in advance, Andy

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.3 : Tue 01 Jul 2003 - 09:11:41 EST