From: Robert Citek <rwcitek_at_alum.calberkeley.org>

Date: Sat 06 May 2006 - 02:30:15 EST

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sat May 06 02:46:17 2006

Date: Sat 06 May 2006 - 02:30:15 EST

On May 5, 2006, at 10:24 AM, Robert Citek wrote:

*> R > foo <- read.delim("dataset.010MM.txt")
**>
*

> R > summary(foo)

*> X15623
**> Min. : 1
**> 1st Qu.: 8152
**> Median :16459
**> Mean :16408
**> 3rd Qu.:24618
**> Max. :32766
*

Reloaded the 10MM set and ran an object.size:

R > object.size(foo)

[1] 440000376

So, 10 MM numbers in about 440 MB. (Are my units correct?) That would explain why 10 MM numbers does work while 100 MM numbers won't work (4 GB limit on 32-bit machine). If my units are correct, then each value would be taking up 4-bytes, which sounds right for a 4- byte word (8 bits/byte * 4-bytes = 32-bits.)

From Googling the archives, the solution that I've seen for working with large data sets seems to be moving to a 64-bit architecture. Short of that, are there any other generic workarounds, perhaps using a RDBMS or a CRAN package that enables working with arbitrarily large data sets?

Regards,

- Robert

http://www.cwelug.org/downloads

Help others get OpenSource software. Distribute FLOSS
for Windows, Linux, *BSD, and MacOS X with BitTorrent

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sat May 06 02:46:17 2006

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Sat 06 May 2006 - 06:10:03 EST.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*