Re: [Rd] pb in regular expression with the character "-" (PR#9437)

From: Fan <xiao.gang.fan1_at_libertysurf.fr>
Date: Thu 04 Jan 2007 - 20:52:07 GMT

Let me detail a bit my bug report:

the two commands ("expected" vs "strange") should return the same result, the objective of the commands is to test the presence of several characters, '-'included.

The order in which we specify the different characters must not be an issue, i.e., to test the presence of several characters, including say char_1, the regular expressions [char_1|char_2|char_3] and [char_2|char_1|char_3] should play the same role. Other softwares work just like this.

What's reported is that R actually returns different result for the character "-" (\- in a RE) regarding it's position in the regular expression, and the "perl" option would not be relevant.

Prof Brian Ripley wrote:
> Why do you think this is a bug in R? You have not told us what you
> expected, but the character range |-| contains only | . Not agreeing
> with your expectations (unstated or otherwise) is not a bug in R.
>
> \- is the same as -, and - is special in character classes. (If it is
> first or last it is treated literally.) And | is not a metacharacter
> inside a character class. Also,
>

>> grep("[d\\-c]", c("a-a","b"))

>
> [1] 1 2
>
>> grep("[d\\-c]", c("a-a","b"), perl=TRUE)

>
> [1] 1
>
> shows that escaping - works only in perl (which you will find from the
> background references mentioned, e.g.
>
> The interpretation of an ordinary character preceded by a backslash
> ('\') is undefined.
>
> .)
>
> This is all carefully documented in ?regexp, e.g.
>
> Patterns are described here as they would be printed by 'cat': do
> remember that backslashes need to be doubled in entering R
> character strings from the keyboard.
>
>
> This is not the first time you have wasted our resources with false bug
> reports, so please show more respect for the R developers' time.
> You were also explicitly asked not to report on obselete versions of R.
>
> On Wed, 3 Jan 2007, xiao.gang.fan1@libertysurf.fr wrote:
>
>> Full_Name: FAN
>> Version: 2.4.0
>> OS: Windows
>> Submission from: (NULL) (159.50.101.9)
>>
>>
>> These are expected:
>>
>>> grep("[\-|c]", c("a-a","b"))
>>
>> [1] 1
>>
>>> gsub("[\-|c]", "&", c("a-a","b"))
>>
>> [1] "a&a" "b"
>>
>> but these are strange:
>>
>>> grep("[d|\-|c]", c("a-a","b"))
>>
>> integer(0)
>>
>>> gsub("[d|\-|c]", "&", c("a-a","b"))
>>
>> [1] "a-a" "b"
>>
>> Thanks
>>
>> ______________________________________________
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri Jan 05 07:56:42 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 04 Jan 2007 - 21:31:05 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.