[R] regexpr and parsing question

From: Kimpel, Mark William <mkimpel_at_iupui.edu>
Date: Tue 30 Jan 2007 - 22:23:45 GMT

The main problem I am trying to solve it this:

I am importing a tab delimited file whose first line contains only one column, which is a descriptor of the form "col_1 col_2 col_3", i.e. the colnames are not tab delineated but are separated by whitespace. I would like to parse this first line and make such that it becomes the colnames of the rest of the file, which I am reading into R using read.delim(). The file is so huge that I must do this in R.

My first question is this: What is the best way to accomplish what I want to do?

My other questions revolve around some failed attempts on my part to solve the problem on my own using regular expressions. I thought that perhaps I could change the first line to "c("col_1", "col_2", "col_3") using gsub. I was having trouble figuring out how R uses the backslash character because I know that sometimes the backslash one would use in Perl needs to be a double backslash in R.

Here is a sample of what I tried and what I got:

a<-"col_1 col_2 col_3"

> gsub("\\s", " " , a)

[1] "col_1 col_2 col_3"

> gsub("\\s", "\\s" , a)

[1] "col_1scol_2scol_3"

As you can see, it looks like R is taking a regular expression for "pattern", but not taking it for "replacement". Why is this?

Assuming that I did want to solve my original problem with gsub and then turn the string into an R object, how would I get gsub to return "c("col_1", "col_2", "col_3") using my original string?

Finally, is there a way to declare a string as a regular expression so that R sees it the same way other languages, such as Perl do, i.e. make the backslash be interpreted the same way? For someone who is just learning regular expressions as I am, it is very frustrating to read about them in references and then have to translate what I've learned into R syntax. I was thinking that instead of enclosing the string in "", one could use THIS.IS.A.REGULAR.EXPRESSION(), similar to the way we use I() in formulae.

These are a bunch of questions, but obviously I have a lot to learn!



Mark W. Kimpel MD  

(317) 490-5129 Work, & Mobile  

(317) 663-0513 Home (no voice mail please)

1-(317)-536-2730 FAX

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed Jan 31 09:28:12 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Wed 31 Jan 2007 - 00:30:28 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.