Re: [R] Suggestion for big files [was: Re: A comment about R:]

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Fri 06 Jan 2006 - 19:08:59 EST

[Just one point extracted: Hadley Wickham has answered the random sample one]

On Thu, 5 Jan 2006, François Pinard wrote:

> [Brian Ripley]
>> One problem with Francois Pinard's suggestion (the credit has got lost)
>> is that R's I/O is not line-oriented but stream-oriented. So selecting
>> lines is not particularly easy in R.
>
> I understand that you mean random access to lines, instead of random
> selection of lines. Once again, this chat comes out of reading someone
> else's problem, this is not a problem I actually have. SPSS was not
> randomly accessing lines, as data files could well be hold on magnetic
> tapes, where random access is not possible on average practice. SPSS
> reads (or was reading) lines sequentially from beginning to end, and the
> _random_ sample is built while the reading goes.

That was not my point. R's standard I/O is through connections, which allow for pushbacks, changing line endings and re-encoding character sets. That does add overhead compared to C/Fortran line-buffered reading of a file. Skipping lines you do not need will take longer than you might guess (based on some limited experience).

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Received on Fri Jan 06 19:17:20 2006

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:41:54 EST