[R] Memory problems with large dataset in rpart

From: <vheijst_at_few.eur.nl>
Date: Tue 18 Oct 2005 - 14:54:10 EST

Dear helpers,

I am a Dutch student from the Erasmus University. For my Bachelor thesis I have written a script in R using boosting by means of classification and regression trees. This script uses the function the predefined function rpart. My input file consists of about 4000 vectors each having 2210 dimensions. In the third iteration R complains of a lack of memory, although in each iteration every variable is removed from the memory. Thus the first two iterations run without any problems.

My computer runs on Windows XP and has 1 gigabye of internal memory. I tried R using more memory by refiguring the swap files as memtioned in the FAQ (/3gb), but I didn't succeed in making this work. The command round(memory.limit()/1048576.0, 2) gives 1023.48

If such an increase of memory can not succeed, perhaps the size of the rpart object could be reduced by not storing unnecessary information. The rpart function call is (the calls of FALSE is to try to reduce the size of the fit object):
fit <- rpart(price ~ ., data = trainingset, control=rpart.control(maxdepth=2,cp=0.001),model=FALSE,x=FALSE,y=FALSE)

This fit object is later called in 2 predict functions, for example: predict(fit,newdata=sample)

Can anybody please help me by letting R use more memory (for example swap) or can anybody help me reducing the size of the fit object?

Kind regards
Dennis van Heijst
Student Informatics & Economics
Erasmus University Rotterdam
The Netherlands

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Oct 18 14:59:47 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:40:45 EST