[R] Identify and extract a whole word of variable length using regular expressions

From: Giulio Di Giovanni <perimessaggini_at_hotmail.com>
Date: Mon, 28 Jun 2010 23:17:08 +0000

Hi everybody,

I'm quite weak with regular expression, and I need some help... I have strings of the type

>a

[1,] "ppe46 Rv3018c MT3098/MT3101 MTV012.32c"
[2,] "ppe16 Rv1135c MT1168"
[3,] "ppe21 Rv1548c MT1599 MTCY48.17"
[4,] "ppe12 Rv0755c MT0779"
[5,] "PE_PGRS51 Rv3367"
[etc..for several hundreds]

I want have instead only:

[1,] "Rv3018c"

[2,] "Rv1135c"

[3,] "Rv1548c"

[4,] "Rv0755c"

[5,] "Rv3367"

Besides these examples, the only thing I know for sure is that the "magic" substrings I want to extract are entire word all starting by "Rv". So "Rvxxxxx", preceded and followed by a space, and of a variable length. I don't have any other infos.

Do you know how to pick them? I checked for their presence using grep, and "\\<Rv*\\>" expression, I tried with some string functions from Hmisc, or in the other way, by substituting with empty strings everything except the Rv word, but I didn't achieve that much... Could you please give me some suggestions?

Thanks a lot,

Giulio                                                


        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 28 Jun 2010 - 23:19:17 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 28 Jun 2010 - 23:40:42 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive