Re: [Rd] pb in regular expression with the character "-" (PR#9437)

From: <ripley_at_stats.ox.ac.uk>
Date: Thu 04 Jan 2007 - 23:06:00 GMT


Both Solaris 8 grep and GNU grep 2.5.1 give

gannet% cat > foo.txt
a-a
b
gannet% egrep '[d|-|c]' foo.txt
gannet% egrep '[-|c]' foo.txt
a-a

agreeing exactly with R (and the POSIX standard) and contradicting 'Fan'.

On Thu, 4 Jan 2007, Fan wrote:

> Let me detail a bit my bug report:
>
> the two commands ("expected" vs "strange") should return the
> same result, the objective of the commands is to test the presence
> of several characters, '-'included.
>
> The order in which we specify the different characters must not be
> an issue, i.e., to test the presence of several characters, including
> say char_1, the regular expressions [char_1|char_2|char_3] and
> [char_2|char_1|char_3] should play the same role. Other softwares
> work just like this.
>
> What's reported is that R actually returns different result for the
> character "-" (\- in a RE) regarding it's position in the regular
> expression, and the "perl" option would not be relevant.

As described in the relevant international standard and R's own documentation.

> Prof Brian Ripley wrote:
>> Why do you think this is a bug in R? You have not told us what you
>> expected, but the character range |-| contains only | . Not agreeing with
>> your expectations (unstated or otherwise) is not a bug in R.
>>
>> \- is the same as -, and - is special in character classes. (If it is
>> first or last it is treated literally.) And | is not a metacharacter
>> inside a character class. Also,
>>
>>> grep("[d\\-c]", c("a-a","b"))
>>
>> [1] 1 2
>>
>>> grep("[d\\-c]", c("a-a","b"), perl=TRUE)
>>
>> [1] 1
>>
>> shows that escaping - works only in perl (which you will find from the
>> background references mentioned, e.g.
>>
>> The interpretation of an ordinary character preceded by a backslash
>> ('\') is undefined.
>>
>> .)
>>
>> This is all carefully documented in ?regexp, e.g.
>>
>> Patterns are described here as they would be printed by 'cat': do
>> remember that backslashes need to be doubled in entering R
>> character strings from the keyboard.
>>
>>
>> This is not the first time you have wasted our resources with false bug
>> reports, so please show more respect for the R developers' time.
>> You were also explicitly asked not to report on obselete versions of R.
>>
>> On Wed, 3 Jan 2007, xiao.gang.fan1@libertysurf.fr wrote:
>>
>>> Full_Name: FAN
>>> Version: 2.4.0
>>> OS: Windows
>>> Submission from: (NULL) (159.50.101.9)
>>>
>>>
>>> These are expected:
>>>
>>>> grep("[\-|c]", c("a-a","b"))
>>>
>>> [1] 1
>>>
>>>> gsub("[\-|c]", "&", c("a-a","b"))
>>>
>>> [1] "a&a" "b"
>>>
>>> but these are strange:
>>>
>>>> grep("[d|\-|c]", c("a-a","b"))
>>>
>>> integer(0)
>>>
>>>> gsub("[d|\-|c]", "&", c("a-a","b"))
>>>
>>> [1] "a-a" "b"
>>>
>>> Thanks
>>>
>>> ______________________________________________
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Fri Jan 05 10:14:13 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Mon 08 Jan 2007 - 12:31:10 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.