Re: [Rd] regex to match word boundaries

From: Gabor Grothendieck <ggrothendieck_at_myway.com>
Date: Mon 06 Dec 2004 - 22:21:36 EST

Gabor Grothendieck <ggrothendieck <at> myway.com> writes:

:
: Can someone verify whether or not this is a bug.
:
: When I substitute all occurrence of "\\B" with "X"
: R seems to correctly place an X at all non-word boundaries
: (whether or not I specify perl) but "\\b" does not seem to
: act on all complement positions:
:
: > gsub("\\b", "X", "abc def") # nothing done
: [1] "abc def"
: > gsub("\\B", "X", "abc def") # as expected, I think
: [1] "aXbXc dXeXf"
: > gsub("\\b", "X", "abc def", perl = TRUE) # not as expected
: [1] "abc Xdef"
: > gsub("\\B", "X", "abc def", perl = TRUE) # as expected
: [1] "aXbXc dXeXf"
: > R.version.string # Windows 2000
: [1] "R version 2.0.1, 2004-11-27"

I have found another possibly related problem. In the above \\B always worked as expected but not \\b. I have an example where \\B does not work as expected either. Note that in the first example below all the letters which are not first in the word get prefaced with X as expected but in the second case only alternate letters which are not first in the word get replaced with X whereas one would have exptected that all letters not first in the word get replaced with X.

R> gsub("\\B", "X", "The Quick Brown Fox") # works as expected [1] "TXhXe QXuXiXcXk BXrXoXwXn FXoXx"

R> gsub("\\B.", "X", "The Quick Brown Fox", perl = TRUE) # problem [1] "TXe QXiXk BXoXn FXx"

R> R.version.string # Windows XP
[1] "R version 2.0.1, 2004-11-04"

By the way, do I have to submit a second bug report for this or is it possible to add this onto the previous one as a comment?



R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Mon Dec 06 22:32:58 2004

This archive was generated by hypermail 2.1.8 : Mon 06 Dec 2004 - 23:21:54 EST