Re: [R] Preprocessing troublesome files in R - looking for some perl like functionality

From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>
Date: Fri 03 Jun 2005 - 00:22:31 EST

"Andy Bunn" <abunn@whrc.org> writes:

> Hi all:
>
> I have acquired a 100s of data files that I need to preprocess to get them
> usable in R. The files are fixed width (to a point) and contain 1 to 3 lines
> of header, followed by a variable number of fixed width data lines (that I
> can read with read.fwf). I want to read through the files and remove every
> _line_ where characters column 83-86 do not equal "STD". If I can do that
> and store it in a text file, then I can get the data I need using read.fwf.
> I can't figure out how to do this because of the irregularity of the header
> info buried in the file. It seems like the kind of thing perl or emacs would
> be good at but I'd like to do it all in R if possible. Any pointers
> appreciated.

How large are the files? With today's RAM sizes, it could be feasible to do something along the lines of

  1. x <- readLines(....), i <- read.fwf(...col83-86...)
  2. read.fwf(textConnection(x[I %in% "STD"]),......)
-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)             FAX: (+45) 35327907

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Fri Jun 03 00:29:29 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:32:21 EST