Re: R-beta: read.table and large datasets

Thomas Lumley (thomas@biostat.washington.edu)
Mon, 9 Mar 1998 11:11:25 -0800 (PST)


Date: Mon, 9 Mar 1998 11:11:25 -0800 (PST)
From: Thomas Lumley <thomas@biostat.washington.edu>
To: Rick White <rick@stat.ubc.ca>
Subject: Re: R-beta: read.table and large datasets
In-Reply-To: <350439E6.2972AACD@stat.ubc.ca>

On Mon, 9 Mar 1998, Rick White wrote:

> I find that read.table cannot handle large datasets. Suppose data is a
> 40000 x 6 dataset
> 
> R -v 100
> 
> x_read.table("data")  gives
> Error: memory exhausted
> but
> x_as.data.frame(matrix(scan("data"),byrow=T,ncol=6))
> works fine.

You need to increase the number of cons cells as well as the vector heap
size

eg

R -v 40 -n 1000000

to allocate 1000000 cons cells instead of the standard 200000.

To see what sort of memory you are running out of, use gcinfo(T), which
tells R to report the memory status after each garbage collection. 


Thomas Lumley
------------------------
Biostatistics		
Uni of Washington	
Box 357232		
Seattle WA 98195-7232	
------------------------


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._