Re: [R] Reading fixed column format

From: Steve Miller <smiller73_at_jhu.edu>
Date: Wed 13 Sep 2006 - 22:54:32 GMT


How about using python/perl/ruby, designed precisely for this type of routine data munging, to pipe the processed output into an R dataframe?  

msci <- read.table(pipe("python steve/python/msci.py"), header=T, as.is=T)  

Iteratively, you could deliver the python output in chunks, something like:  

msci <- read.table(pipe("python steve/python/msci.py 1 500000"), header=T, as.is=T)  

msci <- rbind(msci, read.table(pipe("python steve/python/msci.py 500001 1000000"), header=T, as.is=T))  

etc.  

Steve Miller    

-----Original Message-----
From: r-help-bounces@stat.math.ethz.ch
[mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Jason Barnhart Sent: Wednesday, September 13, 2006 11:52 AM To: Gabor Grothendieck; Anupam Tyagi
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] Reading fixed column format  

Another possibility:  

  1. Split the original file into smaller chunks of xx,xxx of rows.
  2. Process each file using read.fwf saving the requisite variables.

       (If necessary, save each intermediate matrix/data.frame to disk

       to conserve space)

    3) 'rbind' the results.  

Not exactly elegant but it works.  

From: "Gabor Grothendieck" <ggrothendieck@gmail.com>

To: "Anupam Tyagi" <AnupTyagi@yahoo.com>

Cc: <r-help@stat.math.ethz.ch>

Sent: Wednesday, September 13, 2006 7:21 AM

Subject: Re: [R] Reading fixed column format    

> On 9/13/06, Anupam Tyagi <AnupTyagi@yahoo.com> wrote:

>> Gabor Grothendieck <ggrothendieck <at> gmail.com> writes:

>>

>> > C:\bin>cut -c2-3,6-8 a.dat

>> > 23678

>> > 23678

>> > 23678

>>

>> Thanks. I think this will work. How do I redirect the output to a file on

>> windows?

>

> Same as on UNIX

>

> cut -c2-3,6-8 a.dat > a2.dat

>

>> Is there simple way to convert the cut command to a script on windows,

>

> Using notepad or other text editor put it in file a.bat and then

> issue this command from the console

>

> a.bat

>

> Note that you could process it multiple time if you like:

>

> cut -c6-8 a.dat > a2.dat

> cut -c2-3 a2.dat > a3.dat

>

> produces the same thing but uses 2 passes and so keeps each line shorter.

> Be sure you do it from the tail end forward as shown above to avoid having

> to recalculate the positions.

>

>> because the entire command may not fit on one line? Anupam.

>>

>

> ______________________________________________

> R-help@stat.math.ethz.ch mailing list

> https://stat.ethz.ch/mailman/listinfo/r-help

> PLEASE do read the posting guide

> http://www.R-project.org/posting-guide.html

> and provide commented, minimal, self-contained, reproducible code.

>
 


R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu Sep 14 08:58:53 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Wed 13 Sep 2006 - 23:30:05 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.