Re: [R] How to read HUGE data sets?

From: Emmanuel Charpentier <charpent_at_bacbuc.dyndns.org>
Date: Thu, 28 Feb 2008 15:47:31 +0100

Jorge Iván Vélez a écrit :
> Dear R-list,
>
> Does somebody know how can I read a HUGE data set using R? It is a hapmap
> data set (txt format) which is around 4GB. After read it, I need to delete
> some specific rows and columns. I'm running R 2.6.2 patched over XP SP2
> using a 2.4 GHz Core 2-Duo processor and 4GB RAM. Any suggestion would be
> appreciated.

Hmmm... Unless you're running a 64-bits version of XP, you might be SOL (nonwhistanding the astounding feats of the R Core Team, which managed to be able to use about 3,5 GB of memory under 32-bits Windows) : your *raw* data will eat more than the available memory. You might be lucky if some of them can be abstracted (e. g. long character chains that can be reduced to vectors), or get unlucky (large R storage overhead of nonreducible data).

You might consider changing machines : get a 64-bit machine with gobs of memory and cross your fingers. Note that, since R pointers are 64-bits wide instead of 32-bits, data storage needs will inflate...

Depending of the real meaning of your data and the processing they need, you might also consider storing your raw data in a SQL DBMS, reduce them in SQL and read in R only the relevant part(s). There also are some contributed packages that might help in special situations : biglm, birch.

HTH,                                         Emmanuel Charpentier



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 28 Feb 2008 - 14:51:10 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 28 Feb 2008 - 15:30:17 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive