Re: [R] Reading in large file in pieces

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Fri 23 Dec 2005 - 18:41:10 EST

On Thu, 22 Dec 2005, Sean Davis wrote:

> I have a large file (millions of lines) and would like to read it in pieces.
> The file is logically separated into little modules, but these modules do
> not have a common size, so I have to scan the file to know where they are.
> They are independent, so I don't have to read one at the end to interpret
> one at the beginning. Is there a way to read one line at a time and parse
> it on the fly and do so quickly, or do I need to read say 100k lines at a
> time and then work with those? Only a small piece of each module will
> remain in memory after parsing is completed on each module.
>
> My direct question is: Is there a fast way to parse one line at a time
> looking for breaks between "modules", or am I better off taking large but
> manageable chunks from the file and parsing that chunk all at once?

On any reasonable OS (you have not told us yours), it will make no difference as the file reads will be buffered. Assuming you are doing something like opening a connection and calling readLines(n=1), of course.

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Fri Dec 23 18:57:57 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:41:40 EST