[R] Size of data vs. needed memory...rule of thumb?

From: WILLIE, JILL <JILWIL_at_SAFECO.com>
Date: Thu 25 Jan 2007 - 02:04:56 GMT


I have been searching all day & most of last night, but can't find any benchmarking or recommendations regarding R system requirements for very large (2-5GB) data sets to help guide our hardware configuration. If anybody has experience with this they're willing to share or could anybody point me in a direction that might be productive to research, it would be much appreciated. Specifically: will R simply use as much memory as the OS makes available to it, unlimited? Is there a multi-threading version R, packages? Does the core R package support 64-bit & should I expect to see any difference in how memory's handled under that version? Is 3 GB of memory to 1GB of data a reasonable ballpark?

Our testing thus far has been on a windows 32-bit box w/1GB of RAM & 1 CPU; it appears to indicate something like 3GB of RAM for every 1GB of sql table (ex-indexes, byte-sized factors). At this point, we're planning on setting up a dual core 64-bit Linux box w/16GB of RAM for starters, since we have summed-down sql tables of approx 2-5GB generally.

Here's details, just for context, or in case I'm misinterpreting the results, or in case there's some more memory-efficient way to get data in R's binary format than going w/the data.frame.  

R session:

> library(RODBC)
> channel<-odbcConnect("psmrd")
> FivePer <-data.frame(sqlQuery(channel, "select * from
AUTCombinedWA_BILossCost_5per"))                 

		Error: cannot allocate vector of size 2000 Kb
		In addition: Warning messages:
		1: Reached total allocation of 1023Mb: see
help(memory.size) 
		2: Reached total allocation of 1023Mb: see
help(memory.size)

ODBC connection:

                Microsoft SQL Server ODBC Driver Version 03.86.1830

		Data Source Name: psmrd
		Data Source Description: 
		Server: psmrdcdw01\modeling
		Database: OpenSeas_Work1
		Language: (Default)
		Translate Character Data: Yes
		Log Long Running Queries: No
		Log Driver Statistics: No
		Use Integrated Security: Yes
		Use Regional Settings: No
		Prepared Statements Option: Drop temporary procedures on
disconnect
		Use Failover Server: No
		Use ANSI Quoted Identifiers: Yes
		Use ANSI Null, Paddings and Warnings: Yes
		Data Encryption: No

Please be patient, I'm a new R user (or at least I'm trying to be...at this point I'm mostly a new R-help-reader); I'd appreciated being pointed in the right direction if this isn't the right help list to send this question to...or if this question is poorly worded (I did read the posting guide).

Jill Willie
Open Seas
Safeco Insurance
jilwil@safeco.com

        [[alternative HTML version deleted]]



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu Jan 25 13:48:30 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 25 Jan 2007 - 08:30:30 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.