Re: [R] Preprocessing troublesome files in R - looking for some perl like functionality

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Fri 03 Jun 2005 - 00:38:43 EST

On Thu, 2 Jun 2005, Peter Dalgaard wrote:

> "Andy Bunn" <abunn@whrc.org> writes:
>
>> Hi all:
>>
>> I have acquired a 100s of data files that I need to preprocess to get them
>> usable in R. The files are fixed width (to a point) and contain 1 to 3 lines
>> of header, followed by a variable number of fixed width data lines (that I
>> can read with read.fwf). I want to read through the files and remove every
>> _line_ where characters column 83-86 do not equal "STD". If I can do that
>> and store it in a text file, then I can get the data I need using read.fwf.
>> I can't figure out how to do this because of the irregularity of the header
>> info buried in the file. It seems like the kind of thing perl or emacs would
>> be good at but I'd like to do it all in R if possible. Any pointers
>> appreciated.
>
> How large are the files? With today's RAM sizes, it could be feasible
> to do something along the lines of
>
> 1) x <- readLines(....), i <- read.fwf(...col83-86...)
> 2) read.fwf(textConnection(x[I %in% "STD"]),......)

or use a file() (no file= argument) connection, which will be faster for large files (and read.fwf should probably use internally).

I would have used

x <- readLines(...)
tmp <- file()
writeLines(x[substr(x, 83, 86) == "STD"], tmp)
read.fwf(tmp, ...)

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Fri Jun 03 00:42:44 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:32:21 EST