Re: R-beta: read.table and large datasets

Douglas Bates (bates@stat.wisc.edu)
09 Mar 1998 12:56:02 -0600


To: Rick White <rick@stat.ubc.ca>
Subject: Re: R-beta: read.table and large datasets
From: Douglas Bates <bates@stat.wisc.edu>
Date: 09 Mar 1998 12:56:02 -0600
In-Reply-To: Rick White's message of Mon, 09 Mar 1998 18:50:14 +0000

Rick White <rick@stat.ubc.ca> writes:

> I find that read.table cannot handle large datasets. Suppose data is a
> 40000 x 6 dataset
> 
> R -v 100
> 
> x_read.table("data")  gives
> Error: memory exhausted
> but
> x_as.data.frame(matrix(scan("data"),byrow=T,ncol=6))
> works fine.
> 
> read.table is less typing ,I can include the variable names in the first
> line and in Splus executes faster. Is there a fix for read.table on the
> way?

You probably need to increase -n as well as -v to read in this table.
Try setting 
 gcinfo(TRUE)
to see what is happening with the garbage collector.  Most likely it
is running out of cons cells long before it runs out of heap storage.

The reason I suspect this is because I encountered exactly the same
situation several weeks ago and Thomas Lumley pointed this out to me.
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._