Re: [R] Rpart, custom penalty for an error

From: Prof Brian Ripley <>
Date: Sun 10 Sep 2006 - 19:36:10 GMT

On Sun, 10 Sep 2006, Maciej Blizi?ski wrote:

> Hello all R-help list subscribers,
> I'd like to create a regression tree of a data set with binary response
> variable. Only 5% of observations are a success, so the regression tree
> will not find really any variable value combinations that will yield
> more than 50% of probability of success.

This would be a misuse of a regression tree, for the exact problem for which classification trees were designed.

> I am however interested in areas where the probability of success is
> noticeably higher than 5%, for example 20%. I've tried rpart and the
> weights option, increasing the weights of the success-observations.

You are 'misleading' rpart by using 'weights', claiming to have case weights for cases you do not have. You need to use 'cost' instead.

This is a standard issue, discussed in all good books on classification (including mine).

> It works as expected in terms of the tree creation: instead of a single
> root, a tree is being built. But the tree plot() and text() are somewhat
> misleading. I'm interested in the observation counts inside each leaf.
> I use the "use.n = TRUE" parameter. The counts displayed are misleading,
> the numbers of successes are not the original numbers from the sample,
> they seem to be cloned success-observations.

They _are_ the original numbers, for that is what 'case weights' means.

> I'd like to split the tree just as weights parameter allows me to,
> keeping the original number of observations in the tree plot. Is it
> possible? If yes, how?
> Kind regards,
> Maciej

Brian D. Ripley,        
Professor of Applied Statistics,
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.
Received on Mon Sep 11 05:39:07 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Fri 15 Sep 2006 - 21:30:05 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.