Re: [R] How to load a big txt file

From: Charles C. Berry <cberry_at_tajo.ucsd.edu>
Date: Wed, 06 Jun 2007 20:35:50 -0700

Alex,

See

         R Data Import/Export Version 2.5.0 (2007-04-23)

search for 'large' or 'scan'.

Usually, taking care with the arguments

         nlines, what, quote, comment.char

should be enough to get scan() to cooperate.

You will need around 1GB RAM to store the result, so if you are working on a machine with less, you will need to upgrade. Consider storing the result as a numeric matrix.

If any of those columns are long strings not needed in your computation, be sure to skip over them. Read the 'Details' of the help page for scan() carefully.

Chuck

On Thu, 7 Jun 2007, ssls sddd wrote:

> Dear list,
>
> I need to read a big txt file (around 130Mb; 23800 rows and 49 columns)
> for downstream clustering analysis.
>
> I first used "Tumor <- read.table("Tumor.txt",header = TRUE,sep = "\t")"

> but it took a long time and failed. However, it had no problem if I just put
> data of 3 columns.
>
> Is there any way which can load this big file?
>
> Thanks for any suggestions!
>
> Sincerely,
> Alex
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                        (858) 534-2098
                                          Dept of Family/Preventive Medicine
E mailto:cberry_at_tajo.ucsd.edu	         UC San Diego
http://biostat.ucsd.edu/~cberry/         La Jolla, San Diego 92093-0901

______________________________________________
R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 07 Jun 2007 - 03:48:55 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 07 Jun 2007 - 04:31:44 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.