Re: [Rd] R's IO speed

From: Roger D. Peng <rpeng_at_jhsph.edu>
Date: Sat 01 Jan 2005 - 09:57:14 EST

On a ~1.45 million row x 122 column data frame (one "character", one "factor", and the rest "numeric" columns) I can read it into R 2.0.1 using read.csv() in about 150 seconds; memory usage is ~1.5 GB. This is read in using the `nrows', `comment.char = ""', and `colClasses' arguments. On R-devel (2004-12-31), it takes about 120 seconds; memory usage is the same. Not too shabby!

-roger

Prof Brian Ripley wrote:
> R-devel now has some improved versions of read.table and write.table.
>
> For a million-row data frame containing one number, one factor with few
> levels and one logical column, a 56Mb object.
>
> generating it takes 4.5 secs.
>
> calling summary() on it takes 2.2 secs.
>
> writing it takes 8 secs and an additional 10Mb.
>
> saving it in .rda format takes 4 secs.
>
> reading it naively takes 28 secs and an additional 240Mb
>
> reading it carefully (using nrows, colClasses and comment.char) takes 16
> secs and an additional 150Mb (56Mb of which is for the object read in).
> (The overhead of read.table over scan was about 2 secs, mainly in the
> conversion back to a factor.)
>
> loading from .rda format takes 3.4 secs.
>
> [R 2.0.1 read in 23 secs using an additional 210Mb, and wrote in 50 secs
> using an additional 450Mb.]
>
>
> Will Frank Harrell or someone else please explain to me a real
> application in which this is not fast enough?
>



R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sat Jan 01 09:03:12 2005

This archive was generated by hypermail 2.1.8 : Sat 01 Jan 2005 - 11:17:28 EST