[R] rpart question

About this list Date view Thread view Subject view Author view Attachment view

From: lsjensen@micron.com
Date: Wed 05 May 2004 - 08:59:44 EST


Message-id: <363801FFD7B74240A329CEC3F7FE4CC402B83F9B@ntxboimbx07.micron.com>

Wondered about the best way to control for input variables that have a
large number of levels in 'rpart' models. I understand the algorithm
searches through all possible splits (2^(k-1) for k levels) and so
variables with more levels are more prone to be good spliters... so I'm
looking for ways to compensate and adjust for this complexity.

For example, if two variables produce comparable splits in the data but
one contains 2 levels and the other 13 levels then I would like to have
to have the algorithm choose the 'simpler' split.

Is this best done with the 'cost' argument in the rpart options? This
defaults to one for all variables... so would it make sense to scale
this by nlevels in each variable or sqrt(nlevels) or something similar?

Thanks,
Landon

        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.3 : Mon 31 May 2004 - 23:05:07 EST