Re: [R] Memory limits using read.table on Windows XP Pro

From: Latchezar Dimitrov <ldimitro_at_wfubmc.edu>
Date: Sat 25 Jun 2005 - 04:09:10 EST


Thank you very much for your attention. I checked rw-FAQ, did not mention it though. Since it's common req. I thought it is a common practice too and decided not to abuse bandwidth. Apparently wrong. However from what I presented you can easily (I guess) infer it as well. Your guess is about what I used is absolutely correct as I expected BTW. Or yeah, the water is wet although I did not mention it either :-)

R FAQ Frequently Asked Questions on R
Version 2.1.2005-06-22
ISBN 3-900051-08-9: "7.28 Why is read.table() so inefficient?

By default, read.table() needs to read in everything as character data, and then try to figure out which variables to convert to numerics or factors. For a large data set, this takes condiderable amounts of time and memory. Performance can substantially be improved by using the colClasses argument to specify the classes to be assumed for the columns of the table."

(The vital word "condiderable" above is not explained anywhere, so I guess it means considerable. I think you (all) need to check the spelling of the words you (all) use. Although spelling-checkers are much misused they are sometimes useful.)

Is my use of read.table() in accordance with the above? Can it be improved with respect of my problem?

R for Windows FAQ
Version for rw2011
B. D. Ripley and D. J. Murdoch:
(it does not say Prof. but I guess it is "Prof. B. D. Ripley", isn't it?)

"2.11 There seems to be a limit on the memory it uses!

Indeed there is. It is set by the command-line flag --max-mem-size (see How do I install R for Windows?) and defaults to the smaller of the amount of physical RAM in the machine and 1Gb. It can be set to any amount over 16M. (R will not run in less.) Be aware though that Windows has (in most versions) a maximum amount of user virtual memory of 2Gb, and parts of this can be reserved by processes but not used."

So what is wrong if at all in my configuration, settings, parameters, flags, etc. (you name them) with respect of the above?

Although I did not mention it I know very well the diff. b/n GiB, GB, and Gb (as used in rw-FAQ, wrongly I suppose) and your guess is incorrect here. Anyway my estimates as you can see are conservative and so your note does not contribute essential info.

Despite your blunder about my knowledge I suspect that you secretly knew about the conservativeness above so I wonder why after your correct interpretation of my e-mail I did not get plain answer in straight English.

Best regards,
Latchezar Dimitrov

PS. Please do not reply if you do not have any help or suggestions to solve the problem (not about my education, experience, not mentioning all the trivia, etc). Thanks

PPS. I also wonder if you have ever heard about "the magic word" or there is no such thing as magic for Prof.'s

> -----Original Message-----
> From: Prof Brian Ripley [mailto:ripley@stats.ox.ac.uk]
> Sent: Friday, June 24, 2005 12:47 PM
> To: Latchezar Dimitrov
> Cc: r-help@stat.math.ethz.ch
> Subject: Re: [R] Memory limits using read.table on Windows XP Pro
>
> On Fri, 24 Jun 2005, Latchezar Dimitrov wrote:
>
> > Hello,
> >
> > When I try:
> >
> > geno
> >
> <-read.table("2500.geno.tab",header=TRUE,sep="\t",na.strings="
> .",quote="
> > ",comment.char="",colClasses=c("factor"),nrows=2501)
> >
> > I get, after hour(s) of work:
> >
> > Error: cannot allocate vector of size 9 Kb
> >
> > I have:
> >
> > Rgui.exe --max-mem-size=3Gb
> >
> > and
> >
> > multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Microsoft Windows XP
> > Professional" /fastdetect /NoExecute=OptIn /PAE /3GB
> >
> > in boot.ini
> >
> > 2500.geno.tab is a tab-delimited text table with 2500 x 125000 =
> > 312,500,000 3-level (two alphabet characters) factors (x 4 bites =
> > 1,250,000,000 (1.25GB). Even if we double it (as per
> read.table help)
> > it's still 2.5GB < 3Gb. And actually Windows Task Manager
> shows peak
> > mem use for Rgui 2,056,992K (~2.057GB) and total memory
> used 2.62GB.
> > And the total physical memory is 4GB (of which windows recognizes
> > above 3GB)
> >
> > Any help or suggestions?
>
> Do check the rw-FAQ. If you modified R to address more than
> 2GB, you omitted to tell us a vital fact, so I guess you did not.
>
> I think you need to check the actual meaning of G and K,
> although they are much misused. 1,250,000,000 is 1.16GB in
> the units you are using for 3GB.
>
> --
> Brian D. Ripley, ripley@stats.ox.ac.uk
> Professor of Applied Statistics,
http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sat Jun 25 04:13:06 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:33:02 EST