**From:** Andy Bunn (*abunn@montana.edu*)

**Date:** Fri 11 Apr 2003 - 03:36:04 EST

**Next message:**Shutnik: "[R] R help"**Previous message:**Washington Santos da Silva: "[R] Exact confidence intervals based on the hypergeometric distribuiton"**Next in thread:**Prof Brian Ripley: "Re: [R] Classification problem - rpart"**Reply:**Prof Brian Ripley: "Re: [R] Classification problem - rpart"

Message-id: <001201c2ff87$acaecd40$e6a00ecf@simATE>

I am performing a binary classification using a classification tree.

Ironically, the data themselves are 2483 tree (real biological ones)

locations as described by a suite of environmental variables (slope, soil

moisture, radiation load, etc). I want to separate them from an equal number

of random points. Doing eda on the data shows that there is substantial

difference between the tree and random classes, e.g., box and whisker plots

for slope show separation.

The data frame is thus:

curvegrid,dir2tl,dist2tl,slope,tasp,tci10,class

-0.000244141,266,1852.701,2.382412,0.2124468,131,random

0.3005371,246,1146.342,10.45694,0.8045813,63,random

.

.

.

.

-0.3000488,90,10,20.25561,-0.1293357,62,tree

-0.5,90,10,18.68057,-0.05228489,61,tree

-0.6994629,0,0,18.30121,0.0320744,66,tree

I've run rpart on similar data without an issue but when I try it on this

data as follows:

tree <- rpart(class ~ curvegrid + slope + tci10, method="class")

I get the following output:

*> tree
*

n= 4966

node), split, n, loss, yval, (yprob)

* denotes terminal node

1) root 4966 2483 dw (0.500000000 0.500000000)

2) slope=0.3206026,0.5159777,0.679302,0.7163697,1.1324.......... 2574 94

dw (0.963480963 0.036519037) *

3) slope=0,0.1011371,0.1013844,0.2027681,0.2267014,0.32......... MISSING

2392 3 random (0.001254181 0.998745819) *

This is not like other trees I have run!

And:

summary(tree)

*> summary(tree)
*

Call:

rpart(formula = class ~ curvegrid + slope + tci10)

n= 4966

CP nsplit rel error xerror xstd

1 0.9609344 0 1.00000000 1.0322191 0.01418310

2 0.0100000 1 0.03906565 0.7635924 0.01378822

Node number 1: 4966 observations, complexity param=0.9609344

predicted class=dw expected loss=0.5

class counts: 2483 2483

probabilities: 0.500 0.500

left son=2 (2574 obs) right son=3 (2392 obs)

Primary splits:

slope splits as RRRRRRLRRRLRRRRLLRRRRRRR.......

tci10 splits as RRRRRRRRRRLLRLLRLLRLLRLL.......

etc.

Node number 2: 2574 observations

predicted class=dw expected loss=0.03651904

class counts: 2480 94

probabilities: 0.963 0.037

Node number 3: 2392 observations

predicted class=random expected loss=0.001254181

class counts: 3 2389

probabilities: 0.001 0.999

I'm assuming that I have to adjust something in rpart.control. I am also

hesitant at posting prematurely but am in fetters.

Thanks in advance, Andy

______________________________________________

R-help@stat.math.ethz.ch mailing list

https://www.stat.math.ethz.ch/mailman/listinfo/r-help

**Next message:**Shutnik: "[R] R help"**Previous message:**Washington Santos da Silva: "[R] Exact confidence intervals based on the hypergeometric distribuiton"**Next in thread:**Prof Brian Ripley: "Re: [R] Classification problem - rpart"**Reply:**Prof Brian Ripley: "Re: [R] Classification problem - rpart"

*
This archive was generated by hypermail 2.1.3
: Tue 01 Jul 2003 - 09:11:41 EST
*