Re: [R] help with regexpr in gsub

From: Seth Falcon <>
Date: Thu 18 Jan 2007 - 00:46:48 GMT

"Kimpel, Mark William" <> writes:

> I have a very long vector of character strings of the format
> "GO:0008104.ISS" and need to strip off the dot and anything that follows
> it. There are always 10 characters before the dot. The actual characters

> and the number of them after the dot is variable.
> So, I would like to return in the format "GO:0008104" . I could do this
> with substr and loop over the entire vector, but I thought there might
> be a more elegant (and faster) way to do this.
> I have tried gsub using regular expressions without success. The code
> gsub(pattern= "\.*?" , replacement="", x=character.vector)

I guess you want:

    sub("([GO:0-9]+)\\..*$", "\\1", goids)

[You don't need gsub here]

But I don't understand why you wouldn't want to use substr. At least for me substr looks to be about 20x faster than sub for this problem...

  > library(GO)
  > goids = ls(GOTERM)
  > gids = paste(goids, "ISS", sep=".")
  > gids[1:10]
   [1] "GO:0000001.ISS" "GO:0000002.ISS" "GO:0000003.ISS" "GO:0000004.ISS"
   [5] "GO:0000006.ISS" "GO:0000007.ISS" "GO:0000009.ISS" "GO:0000010.ISS"    [9] "GO:0000011.ISS" "GO:0000012.ISS"      > system.time(z <- substr(gids, 0, 10))

     user system elapsed
    0.008 0.000 0.007
  > system.time(z2 <- sub("([GO:0-9]+)\\..*$", "\\1", gids))

     user system elapsed
    0.136 0.000 0.134

+ seth mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Thu Jan 18 11:54:00 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 18 Jan 2007 - 01:30:27 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.