From: <jenniferbecq_at_free.fr>

Date: Tue 22 Mar 2005 - 03:13:50 EST

smalldata = data.frame(cbind(distance,group3,group4,group5,group6))

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Mar 22 03:18:12 2005

Date: Tue 22 Mar 2005 - 03:13:50 EST

I have a problem using rpart (R 2.0.1 under Unix)

distance group3 group4 group5 group6 group7 group8 pos_1 0.141836040224967 a c e a g g pos_501 0.153605961621317 a a a a g g pos_1001 0.152246705384699 a c e a g g pos_1501 0.145563737522463 a c e a g g pos_2001 0.143940027378837 a c e e g g

When using rpart() as follow, the program runs for ages, and after a few hours, R is abruptly killed :

library(rpart)

fit <- rpart(distance ~ ., data = mydata)

When I change the categorical variables into numeric values (e.g. a = 1, b = 2, c = 3, etc...), the program runs normally in a few seconds. But this is not what I want because it separates my variables according to "group7 > 4.5" (continuous) and not "group7 = a,b,d,f" or "c,e,g" (discrete).

here is the result :

*>fit
*

n= 9271

node), split, n, deviance, yval

When I create a small dataframe such as the example above, e.g. :

distance = rnorm(5,0.15,0.01) group3 = c("a","a","a","a","a") group4 = c("c","a","c","c","c") group5 = c("e","a","e","e","e") group6 = c("a","a","a","a","e")

smalldata = data.frame(cbind(distance,group3,group4,group5,group6))

The program runs normally in a few seconds.

Why does it work using the large dataset whith only numeric values but not with categorical predictor variables ?

I thank you all for your time and help,

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Mar 22 03:18:12 2005

*
This archive was generated by hypermail 2.1.8
: Fri 03 Mar 2006 - 03:30:51 EST
*