[Rd] Reading a repeated fixed format

From: Douglas Bates <dmbates_at_gmail.com>
Date: Wed 10 Aug 2005 - 20:41:39 GMT

The Harwell-Boeing format for exchanging matrices is one of those lovely legacy formats that is based on fixed-format Fortran specifications and 80 character records. (Those of you who don't know why they would be 80 characters instead of, say, 60 or 100 can ask one of us old-timers some day and we'll tell you long, boring stories about working with punched cards.)

Reading this format would take about 10 lines of R code if it were not for the fact that it allows things like 40 two-digit integers to be written as one 80 character record with no separators. This actually made sense to some people once upon a time.

I could use read.fwf or, better, use some of the code in the read.fwf function to extract the strings that should have been separated and convert them to numeric values but I have been trying to think if there is a more clever way of doing this. I know the number of records and the number of elements to read and, if it would help, I can assemble the records into one long text string.

Can anyone think of a vectorized way to extract successive substrings of length k or, perhaps, a way to use regular expressions to insert a blank after every k characters?

R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Thu Aug 11 06:45:42 2005

This archive was generated by hypermail 2.1.8 : Mon 20 Feb 2006 - 03:21:16 GMT