**From:** Carolin Strobl (*carolin.strobl@gmx.de*)

**Date:** Sat 08 May 2004 - 02:14:44 EST

**Next message:**Uwe Ligges: "Re: [R] Quantile of a function"**Previous message:**Uwe Ligges: "Re: [R] Start problem"

Message-id: <9623.1083946484@www24.gmx.net>

Hi,

I have a technical question about rpart:

according to Breiman et al. 1984, different costs for misclassification in

CART can be modelled

either by means of modifying the loss matrix or by means of using different

prior probabilities for the classes,

which again should have the same effect as using different weights for the

response classes.

What I tried was this:

library(rpart)

data(kyphosis)

#fit1 from original unweighted data set

fit1 <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis)

#modify loss matrix

loss<-matrix(c(0,1,2,0),nrow=2,ncol=2)

# true class?

# [,1] [,2]

#[1,] 0 2

#[2,] 1 0 predicted class?

#modify priors

prior=c(1/3,2/3)

fit2<- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis,

parms=list(loss=loss))

fit3 <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis,

parms=list(prior=prior))

fit2

fit3

par(mfrow=c(2,1))

plot(fit2)

text(fit2,use.n=T)

plot(fit3)

text(fit3,use.n=T)

#lead to similar but not identical trees (similar topology but different

cutoff points),

#while all other combinations (even complete reversion, i.e. preference for

the other class)

#lead to totally different trees...

#third approach using weights:

#sorting of data to design weight vector

ind<-order(kyphosis[,1])

kyphosis1<-kyphosis[ind,]

summary(kyphosis1[,1])

weight<-c(rep(1,64),rep(2,17))

summary(as.factor(weight))

fit4 <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis1,

weights=weight)

#leads to result very similar to fit2 with

loss<-matrix(c(0,1,2,0),nrow=2,ncol=2)

#(same tree and cutoff points, but slightly different probabilities, maybe

numerical artefact?)

fit4

plot(fit4)

text(fit4,use.n=T)

#doule check with inverse loss matrix

loss<-matrix(c(0,1,2,0),nrow=2,ncol=2,byrow=T)

fit2<- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis,

parms=list(loss=loss))

weight<-c(rep(2,64),rep(1,17))

fit4 <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis1,

weights=weight)

fit2

fit4

#also same except for probabilities yprob

I don't see

1. why the approach using prior probabilities doesn't work

2. what causes the differences in predicted probabilities in the weights

approach

Any idea? Thank You! C.

--______________________________________________ R-help@stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

**Next message:**Uwe Ligges: "Re: [R] Quantile of a function"**Previous message:**Uwe Ligges: "Re: [R] Start problem"

*
This archive was generated by hypermail 2.1.3
: Mon 31 May 2004 - 23:05:08 EST
*