Re: [R] sequential processing

From: bogdan romocea <br44114_at_gmail.com>
Date: Mon 22 Jan 2007 - 21:16:28 GMT


One option for processing very large files with R is split:   ## split a large file into pieces
  #--parameters: the folder, file and number of parts   FLD=/home/user/data
  F=very_large_file.dat
  parts=50
  #---split
  cd $FLD
  fn=`echo $F | awk -F\. '{print $1}'` #file name without extension   ln=`wc -l $F | awk '{print $1}'` #number of lines in the file   forsplit=`expr $ln / $parts + 1` #number of lines in each part   echo "====== $F will be split in $parts parts of $forsplit lines each."   split -l $forsplit $F $fn
You could also load the entire file into a DBMS then pull parts of it into R, or read specific lines through a pipe e.g. readLines(pipe("sed, grep, python... command")).

Don't try to replicate the SAS processing into R. The exact translations of the SAS DATA STEP usage of _N_, first., last., retain etc into R would be: inefficient, ugly, retrogressive, wrong, rigid, complicated, silly and so on. For a start, read up on indexing - this seemingly simple and innocuous R feature is in fact far more powerful than the entire DATA STEP with its whole bag of tricks. Then search the list for similar questions, for example http://thread.gmane.org/gmane.comp.lang.r.general/44332/focus=44343

> -----Original Message-----
> From: r-help-bounces@stat.math.ethz.ch
> [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Gerard Smits
> Sent: Sunday, January 21, 2007 2:22 PM
> To: r-help@stat.math.ethz.ch
> Subject: [R] sequential processing
>
> Like many others, I am new to R but old to SAS.
>
> Am I correct in understanding that R processes a data frame in a
> sequential ly? This would imply that large input files could be
> read, without the need to load the entire file into memory.
> Related to the manner of reading a frame, I have been looking for the
> equivalent of SAS _n_ (I realize that I can use a variant of which to
> identify an index value) as well as useful SAS features such as
> first., last., retain, etc. Any help with this conversion
> appreciated.
>
> Thanks,
>
> Gerard Smits
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue Jan 23 08:28:08 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Mon 22 Jan 2007 - 21:30:33 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.