Re: [Rd] (PR#9437) pb in regular expression with the character "-"

From: Hin-Tak Leung <hin-tak.leung_at_cimr.cam.ac.uk>
Date: Mon 08 Jan 2007 - 12:15:24 GMT

May I chip in at this point - I agree the bug report was invalid, but many of the replies were missing the point, as far as I see. It wasn't the backslash escape that "Fan" is *mainly* confused about (which he obviously is...), but the uses of the different brackets: [] ,() .

He/She was expecting this:

         egrep '[a|\-|c]' foo.txt
to work the same as:

         egrep '(a|\-|c)' foo.txt

which they do not. They are totally different. (and he doesn't know the proper use of "|" either... so we basically have established that "Fan" doesn't understand how \, |, [] and () are used in regular expressions...).

HTL ripley@stats.ox.ac.uk wrote:
> Both Solaris 8 grep and GNU grep 2.5.1 give
>
> gannet% cat > foo.txt
> a-a
> b
> gannet% egrep '[d|-|c]' foo.txt
> gannet% egrep '[-|c]' foo.txt
> a-a
>
> agreeing exactly with R (and the POSIX standard) and contradicting 'Fan'.
>
>
> On Thu, 4 Jan 2007, Fan wrote:
>

>> Let me detail a bit my bug report:
>>
>> the two commands ("expected" vs "strange") should return the
>> same result, the objective of the commands is to test the presence
>> of several characters, '-'included.
>>
>> The order in which we specify the different characters must not be
>> an issue, i.e., to test the presence of several characters, including
>> say char_1, the regular expressions [char_1|char_2|char_3] and 
>> [char_2|char_1|char_3] should play the same role. Other softwares
>> work just like this.
>>
>> What's reported is that R actually returns different result for the
>> character "-" (\- in a RE) regarding it's position in the regular
>> expression, and the "perl" option would not be relevant.

>
> As described in the relevant international standard and R's own
> documentation.
>
>> Prof Brian Ripley wrote:
>>> Why do you think this is a bug in R?  You have not told us what you 
>>> expected, but the character range |-| contains only | .  Not agreeing with 
>>> your expectations (unstated or otherwise) is not a bug in R.
>>>
>>> \- is the same as -, and - is special in character classes.  (If it is 
>>> first or last it is treated literally.)  And | is not a metacharacter 
>>> inside a character class.  Also,
>>>
>>>> grep("[d\\-c]", c("a-a","b"))
>>> [1] 1 2
>>>
>>>> grep("[d\\-c]", c("a-a","b"), perl=TRUE)
>>> [1] 1
>>>
>>> shows that escaping - works only in perl (which you will find from the 
>>> background references mentioned, e.g.
>>>
>>>   The interpretation of an ordinary character preceded by a backslash
>>>   ('\') is undefined.
>>>
>>> .)
>>>
>>> This is all carefully documented in ?regexp, e.g.
>>>
>>>      Patterns are described here as they would be printed by 'cat': do
>>>      remember that backslashes need to be doubled in entering R
>>>      character strings from the keyboard.
>>>
>>>
>>> This is not the first time you have wasted our resources with false bug 
>>> reports, so please show more respect for the R developers' time.
>>> You were also explicitly asked not to report on obselete versions of R.
>>>
>>> On Wed, 3 Jan 2007, xiao.gang.fan1@libertysurf.fr wrote:
>>>
>>>> Full_Name: FAN
>>>> Version: 2.4.0
>>>> OS: Windows
>>>> Submission from: (NULL) (159.50.101.9)
>>>>
>>>>
>>>> These are expected:
>>>>
>>>>> grep("[\-|c]", c("a-a","b"))
>>>> [1] 1
>>>>
>>>>> gsub("[\-|c]", "&", c("a-a","b"))
>>>> [1] "a&a" "b"
>>>>
>>>> but these are strange:
>>>>
>>>>> grep("[d|\-|c]", c("a-a","b"))
>>>> integer(0)
>>>>
>>>>> gsub("[d|\-|c]", "&", c("a-a","b"))
>>>> [1] "a-a" "b"
>>>>
>>>> Thanks
>>>>
>>>> ______________________________________________
>>>> R-devel@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Mon Jan 08 23:25:07 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 09 Jan 2007 - 05:31:11 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.