Re: [R] sequential processing

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Tue 23 Jan 2007 - 08:12:12 GMT

On Mon, 22 Jan 2007, Gerard Smits wrote:

> So, I take it, given that the use of a pipe is suggested for
> sequential reading, that the standard approach to processing a data
> frame is to load the entire file? Please correct if wrong.

Yes, because most data frames are tiny compared to current RAM sizes. But the R has connections and lots of means to read from them indicates that other approaches are also supported. Large datasets are often kept in DBMSs, and data transferred to R as required.

There is an 'R Data Import/Export' manual, and this would have illuminated the subject for you.

> BTW, I am not interested in finding direct translations of SAS data
> step statements to R, but instead in finding an approach by which I
> can address the type of problems I consistent have to deal with
> (grouped processing with retention of baseline records, etc.). I'll
> read more on the indexing as a means of dealing with relative position issues
>
> Thanks,
>
> Gerard
>
>
>
>> You could also load the entire file into a DBMS then pull parts of it
>> into R, or read specific lines through a pipe e.g.
>> readLines(pipe("sed, grep, python... command")).
>>
>> Don't try to replicate the SAS processing into R. The exact
>> translations of the SAS DATA STEP usage of _N_, first., last., retain
>> etc into R would be: inefficient, ugly, retrogressive, wrong, rigid,
>> complicated, silly and so on. For a start, read up on indexing - this
>> seemingly simple and innocuous R feature is in fact far more powerful
>> than the entire DATA STEP with its whole bag of tricks. Then search
>> the list for similar questions, for example
>> http://thread.gmane.org/gmane.comp.lang.r.general/44332/focus=44343
>>
>>
>>> -----Original Message-----
>>> From: r-help-bounces@stat.math.ethz.ch
>>> [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Gerard Smits
>>> Sent: Sunday, January 21, 2007 2:22 PM
>>> To: r-help@stat.math.ethz.ch
>>> Subject: [R] sequential processing
>>>
>>> Like many others, I am new to R but old to SAS.
>>>
>>> Am I correct in understanding that R processes a data frame in a
>>> sequential ly? This would imply that large input files could be
>>> read, without the need to load the entire file into memory.
>>> Related to the manner of reading a frame, I have been looking for the
>>> equivalent of SAS _n_ (I realize that I can use a variant of which to
>>> identify an index value) as well as useful SAS features such as
>>> first., last., retain, etc. Any help with this conversion
>>> appreciated.
>>>
>>> Thanks,
>>>
>>> Gerard Smits
>>>
>>> ______________________________________________
>>> R-help@stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Tue Jan 23 19:16:51 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 23 Jan 2007 - 09:30:33 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.