From: Gabor Grothendieck <ggrothendieck_at_gmail.com>

Date: Tue 30 Jan 2007 - 23:52:11 GMT

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed Jan 31 10:57:00 2007

Date: Tue 30 Jan 2007 - 23:52:11 GMT

And here is an alternative to the regular expressions (although again I don't think you really need any of this):

*> capture.output(dput(strsplit("col1 col2 col3", " ")[[1]]))
*

[1] "c(\"col1\", \"col2\", \"col3\")"

On 1/30/07, Gabor Grothendieck <ggrothendieck@gmail.com> wrote:

> Both spaces and tabs are whitespace so this

*> should be good enough (unless you can
**> have empty fields):
**>
**> read.table("myfile.dat", header = TRUE)
**>
**> See the sep= argument in ?read.table .
**>
**> Although I don't think you really need this, here are
**> some regular expressions for processing a header
**> into the form you asked for. The first line places
**> quotes around the names, the second one inserts
**> commas and the last one adds c( and ).
**>
**> s <- gsub('(\\S+)', '"\\1"', 'col1 col2 col3')
**> s <- gsub("(\\S+) ", "\\1, ", s)
**> sub("(.*)", "c(\\1)", s)
**>
**>
**> On 1/30/07, Kimpel, Mark William <mkimpel@iupui.edu> wrote:
**> > The main problem I am trying to solve it this:
**> >
**> > I am importing a tab delimited file whose first line contains only one
**> > column, which is a descriptor of the form "col_1 col_2 col_3", i.e. the
**> > colnames are not tab delineated but are separated by whitespace. I would
**> > like to parse this first line and make such that it becomes the colnames
**> > of the rest of the file, which I am reading into R using read.delim().
**> > The file is so huge that I must do this in R.
**> >
**> > My first question is this: What is the best way to accomplish what I
**> > want to do?
**> >
**> > My other questions revolve around some failed attempts on my part to
**> > solve the problem on my own using regular expressions. I thought that
**> > perhaps I could change the first line to "c("col_1", "col_2", "col_3")
**> > using gsub. I was having trouble figuring out how R uses the backslash
**> > character because I know that sometimes the backslash one would use in
**> > Perl needs to be a double backslash in R.
**> >
**> > Here is a sample of what I tried and what I got:
**> >
**> > a<-"col_1 col_2 col_3"
**> >
**> > > gsub("\\s", " " , a)
**> >
**> > [1] "col_1 col_2 col_3"
**> >
**> > > gsub("\\s", "\\s" , a)
**> >
**> > [1] "col_1scol_2scol_3"
**> >
**> > As you can see, it looks like R is taking a regular expression for
**> > "pattern", but not taking it for "replacement". Why is this?
**> >
**> > Assuming that I did want to solve my original problem with gsub and then
**> > turn the string into an R object, how would I get gsub to return
**> > "c("col_1", "col_2", "col_3") using my original string?
**> >
**> > Finally, is there a way to declare a string as a regular expression so
**> > that R sees it the same way other languages, such as Perl do, i.e. make
**> > the backslash be interpreted the same way? For someone who is just
**> > learning regular expressions as I am, it is very frustrating to read
**> > about them in references and then have to translate what I've learned
**> > into R syntax. I was thinking that instead of enclosing the string in
**> > "", one could use THIS.IS.A.REGULAR.EXPRESSION(), similar to the way we
**> > use I() in formulae.
**> >
**> > These are a bunch of questions, but obviously I have a lot to learn!
**> >
**> > Thanks,
**> >
**> > Mark
**> >
**> > Mark W. Kimpel MD
**> >
**> >
**> >
**> > (317) 490-5129 Work, & Mobile
**> >
**> >
**> >
**> > (317) 663-0513 Home (no voice mail please)
**> >
**> > 1-(317)-536-2730 FAX
**> >
**> > ______________________________________________
**> > R-help@stat.math.ethz.ch mailing list
**> > https://stat.ethz.ch/mailman/listinfo/r-help
**> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
**> > and provide commented, minimal, self-contained, reproducible code.
**> >
**>
*

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed Jan 31 10:57:00 2007

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Wed 31 Jan 2007 - 01:30:27 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*