Re: [R] sscanf equivalent

From: Paul Roebuck <roebuck_at_wotan.mdacc.tmc.edu>
Date: Sun 09 Oct 2005 - 17:36:38 EST

On Fri, 7 Oct 2005, Prof Brian Ripley wrote:

> On Fri, 7 Oct 2005, Paul Roebuck wrote:
>
> > I have a data file from which I need to read portions of
> > data but data location/quantity can change from file to file.
> > I wrote some code and have a working solution but it seems
> > wasteful to have to do it this way. Here's the contrived
> > incomplete code.
> >
> > datalines <- readLines(datafile.pathname)
> > # marker will appear on line preceding and following
> > # actual data
> > offset.data <- grep("marker", datalines)
> > datalines <- NULL
> >
> > # grab first column of each assoc dataline
> > data <- scan(datafile.pathname,
> > what = numeric(0),
> > skip = offset.data[1],
> > nlines = offset.data[2]-offset.data[1]-1,
> > flush = TRUE,
> > multi.line = FALSE,
> > quiet = TRUE)
> > # output is vector of values
> >
> > Originally wrote code to parse data from 'datalines'
> > using sub and strsplit methods but it was woefully slower
> > and more complex than using scan method. What is desired
> > is a means of invoking method like scan but with existing
> > data instead of filename.
>
> Why not use a text connection?

I tried that but result was far slower than the method above.

R> file.info(datafile.pathname)$size
[1] 944850
R> system.time(datalines<-readLines(datafile.pathname), TRUE)[3] [1] 0.59
R> length(datalines)
[1] 67931
R> system.time(tconn<-textConnection(datalines), TRUE)[3] [1] 52.97

Once a textConnection object was created, the scan method invocation using it took less than half the time of the corresponding filename-based invocation. Problem is that this was only taking a second to perform the scan using the filename-based invocation. And since grep method doesn't accept textConnection as argument, I still require the otherwise unused 'datalines' variable and its associated memory. Even if grep supported such, the timing increased even more not having the variable.

R> system.time(tconn<-textConnection(readLines(datafile.pathname)), TRUE)[3] [1] 66.61

Any other thoughts?

# R version 2.1.1, 2005-06-20, powerpc-apple-darwin7.9.0



SIGSIG -- signature too long (core dumped)

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sun Oct 09 17:41:52 2005

This archive was generated by hypermail 2.1.8 : Sun 23 Oct 2005 - 18:33:02 EST