Re: [R] clara - memory limit

From: Huntsinger, Reid <reid_huntsinger_at_merck.com>
Date: Thu 04 Aug 2005 - 03:21:45 EST


I thought setting keep.data=FALSE might help, but running this on a 32-bit Linux machine, the R process seems to use 1.2 GB until just before clara returns, when it increases to 1.9 GB, regardless of whether keep.data=FALSE or TRUE. Possibly it's the overhead of the .C() interface, but that's mostly an uninformed guess.

You could sample your data (say half), remove the original, run clara, keep the mediods, then read your data again and assign each observation to the nearest mediod. This is what clara does anyway, with much smaller samples by default.

Reid Huntsinger

-----Original Message-----
From: r-help-bounces@stat.math.ethz.ch
[mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Nestor Fernandez Sent: Wednesday, August 03, 2005 12:45 PM To: r-help@stat.math.ethz.ch
Subject: [R] clara - memory limit

Dear all,

I'm trying to estimate clusters from a very large dataset using clara but the
program stops with a memory error. The (very simple) code and the error:

mydata<-read.dbf(file="fnorsel_4px.dbf") my.clara.7k<-clara(mydata,k=7)

>Error: cannot allocate vector of size 465108 Kb

The dataset contains >3,000,000 rows and 15 columns. I'm using a windows computer with 1.5G RAM; I also tried changing the memory limit to the maximum
possible (4000M)
Is there a way to calculate clara clusters from such large datasets?

Thanks a lot.

Nestor.-



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Aug 04 03:31:17 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:39:39 EST