[R] Are minbucket and minsplit rpart options working as expected?

From: Carlos J. Gil Bellosta <cgb_at_datanalytics.com>
Date: Thu 08 Dec 2005 - 06:10:51 EST


Dear r-list:

I am using rpart to build a tree on a dataset. First I obtain a perhaps too large tree:

> arbol.bsvg.02 <- rpart(formula, data = bsvg, subset=grp.entr,
control=rpart.control(cp=0.001))
> arbol.bsvg.02

n= 100000

node), split, n, loss, yval, (yprob)

So I decide not to consider branches with less than 1000 observations, a 1% of the original number of observations. Therefore, according to the rpart.control help pages, I set minbucket=1000. However,

> arbol.bsvg.02

n= 100000

node), split, n, loss, yval, (yprob)

And I get an "empty" tree. But there were branches in the original tree with more than 1000 observations. Something similar happens if I set minsplit (or both minbucket and minsplit) to a similar value: I end up with the same root, branch-less tree.

Am I misreading something? Can anybody cast a light on the correct usage of the minbucket (and/or minsplit) for me?

Sincerely,

Carlos J. Gil Bellosta
http://www.datanalytics.com



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Dec 08 06:32:13 2005

This archive was generated by hypermail 2.1.8 : Thu 08 Dec 2005 - 09:31:29 EST