Re: [Rd] Reading a repeated fixed format

From: Prof Brian Ripley <>
Date: Wed 10 Aug 2005 - 21:03:42 GMT

On Wed, 10 Aug 2005, Douglas Bates wrote:

> The Harwell-Boeing format for exchanging matrices is one of those
> lovely legacy formats that is based on fixed-format Fortran
> specifications and 80 character records. (Those of you who don't know
> why they would be 80 characters instead of, say, 60 or 100 can ask one
> of us old-timers some day and we'll tell you long, boring stories
> about working with punched cards.)
> Reading this format would take about 10 lines of R code if it were not
> for the fact that it allows things like 40 two-digit integers to be
> written as one 80 character record with no separators. This actually
> made sense to some people once upon a time.
> I could use read.fwf or, better, use some of the code in the read.fwf
> function to extract the strings that should have been separated and
> convert them to numeric values but I have been trying to think if
> there is a more clever way of doing this. I know the number of
> records and the number of elements to read and, if it would help, I
> can assemble the records into one long text string.
> Can anyone think of a vectorized way to extract successive substrings
> of length k or, perhaps, a way to use regular expressions to insert a
> blank after every k characters?

substr(ng) can do that:

st <- "1234567890abcdef"
lens <- seq(0, nchar(st), 2)
substring(st, 1+lens[-length(lens)], lens[-1]) [1] "12" "34" "56" "78" "90" "ab" "cd" "ef"

as it is vectorized internally.

Brian D. Ripley,        
Professor of Applied Statistics,
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________ mailing list
Received on Thu Aug 11 07:05:57 2005

This archive was generated by hypermail 2.1.8 : Mon 24 Oct 2005 - 22:27:37 GMT