[R] Reading a large csv file row by row

From: Yuchen Luo <realityrandom_at_gmail.com>
Date: Fri 06 Apr 2007 - 08:41:36 GMT

Hi, my friends.

When a data file is large, loading the whole file into the memory all together is not feasible. A feasible way is to read one row, process it, store the result, and read the next row.

In Fortran, by default, the 'read' command reads one line of a file, which is convenient, and when the same 'read' command is executed the next time, the next row of the same file will be read.

I tried to replicate such row-by-row reading in R.I use scan( ) to do so with the "skip= xxx " option. It takes only seconds when the number of the rows is within 1000. However, it takes hours to read 10000 rows. I think it is because every time R reads, it needs to start from the first row of the file and count xxx rows to find the row it needs to read. Therefore, it takes more time for R to locate the row it needs to read.

Is there a solution to this problem?

Your help will be highly appreciated!
Best Wishes
 Yuchen Luo

        [[alternative HTML version deleted]]

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri Apr 06 18:49:06 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Fri 06 Apr 2007 - 10:30:55 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.