[R] Parsing

From: Paolo Sonego <paolo.sonego_at_gmail.com>
Date: Wed, 09 Jul 2008 11:33:28 +0200


Dear R users,

I have a big text file formatted like this:

x      x_string
y      y_string

id1 id1_string
id2 id2_string
z      z_string
w      w_string

stuff stuff stuff
stuff stuff stuff
stuff stuff stuff
//
x      x_string1
y      y_string1
z      z_string1
w      w_string1

stuff stuff stuff
stuff stuff stuff
stuff stuff stuff
//
x      x_string2
y      y_string2

id1 id1_string1
id2 id2_string1
z      z_string2
w      w_string2

stuff stuff stuff
stuff stuff stuff
stuff stuff stuff
//
...
...

I'd like to parse this file and retrieve the x, y, id1, id2, z, w fields and save them into a a matrix object:

x y id1 id2 z w

x_string y_string   id1_string  id2_string  z_string   w_string  
x_string1 y_string1 NA          NA          z_string1  w_string1
x_string2 y_string2 id1_string1 id2_string1 z_string2  w_string2

...

...

id1, id2 fields are not always present within a section (the interval between x and the last stuff) and
I'd like to insert a NA when they are absent (see above) so that
length(x)==length(y)==length(id1)==... .

Without the id1, id2 fields the task is easily solvable importing the text file with readLines and retrieving the single fields with grep:

input = readLines("file.txt")
x = grep("^x\\s", input, value = T)
id1 = grep("^id1\\s", input, value = T)
...

I'd like to accomplish this task entirely in R (no SQL, no perl script), possibly without using loops.

Any suggestions are quite welcome!

Regards,
Paolo



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 09 Jul 2008 - 09:45:00 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 09 Jul 2008 - 13:31:49 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive