Re: [Rd] regex to match word boundaries

From: Martin Maechler <maechler_at_stat.math.ethz.ch>
Date: Thu 02 Dec 2004 - 18:49:02 EST

>>>>> "Gabor" == Gabor Grothendieck <ggrothendieck@myway.com> >>>>> on Wed, 1 Dec 2004 21:05:59 -0500 (EST) writes:

    Gabor> Can someone verify whether or not this is a bug.

    Gabor> When I substitute all occurrence of "\\B" with "X" R
    Gabor> seems to correctly place an X at all non-word
    Gabor> boundaries (whether or not I specify perl) but "\\b"
    Gabor> does not seem to act on all complement positions:

>> gsub("\\b", "X", "abc def") # nothing done
    Gabor> [1] "abc def"
>> gsub("\\B", "X", "abc def") # as expected, I think
    Gabor> [1] "aXbXc dXeXf"
>> gsub("\\b", "X", "abc def", perl = TRUE) # not as
>> expected
    Gabor> [1] "abc Xdef"
>> gsub("\\B", "X", "abc def", perl = TRUE) # as expected
    Gabor> [1] "aXbXc dXeXf"
>> R.version.string # Windows 2000
    Gabor> [1] "R version 2.0.1, 2004-11-27"

I agree this looks "unfortunate".

Just to confirm:
1) I get the same on a Linux version
2) the real perl does behave differently and as

   you (and I) would have expected:

 $ echo 'abc def'| perl -pe 's/\b/X/g'
 XabcX XdefX
 $ echo 'abc def'| perl -pe 's/\B/X/g'
 aXbXc dXeXf

Also, from what I see, "\b" should behave the same independently of perl = TRUE or FALSE.

--
Martin

______________________________________________
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Thu Dec 02 18:55:05 2004

This archive was generated by hypermail 2.1.8 : Thu 02 Dec 2004 - 19:20:23 EST