Re: [Rd] pb in regular expression with the character "-" (PR#9437)

From: <>
Date: Thu 04 Jan 2007 - 21:18:08 GMT

>>>>> "FanX" == Xiao Gang Fan <> >>>>> on Thu, 04 Jan 2007 21:52:07 +0100 writes:

    FanX> Let me detail a bit my bug report: the two commands
    FanX> ("expected" vs "strange") should return the same
    FanX> result, the objective of the commands is to test the
    FanX> presence of several characters, '-'included.

    FanX> The order in which we specify the different characters
    FanX> must not be an issue, i.e., to test the presence of
    FanX> several characters, including say char_1, the regular
    FanX> expressions [char_1|char_2|char_3] and
    FanX> [char_2|char_1|char_3] should play the same     FanX> role. Other softwares work just like this.
    FanX> What's reported is that R actually returns different
    FanX> result for the character "-" (\- in a RE) regarding
    FanX> it's position in the regular expression, and the
    FanX> "perl" option would not be relevant.

Fan, it seems haven't understood what Brian Ripley explained to you: Let me try to spell it out for you:

"\-" is *NOT* what you seem still to be thinking it is:

  > "\-"
  [1] "-"
  > identical("\-", "-")
  [1] TRUE

This is all in the R-FAQ entry

>>> 7.37 Why does backslash behave strangely inside strings?

and in several other places, and yes,
please do read the R FAQ and maybe more documentation about R and "bug reporting" before your next bug report.

Consider my guesstimate:
For 99% of all R users, the amount of time they need working pretty intensely with R before they find a bug in it, is nowadays more than three years, and maybe even much more
-- such as their lifetime :-)

Martin Maechler, ETH Zurich

    FanX> Prof Brian Ripley wrote:

    >> Why do you think this is a bug in R?  You have not told
    >> us what you expected, but the character range |-|
    >> contains only | .  Not agreeing with your expectations
    >> (unstated or otherwise) is not a bug in R.
    >> \- is the same as -, and - is special in character
    >> classes.  (If it is first or last it is treated
    >> literally.)  And | is not a metacharacter inside a
    >> character class.  Also,

>>> grep("[d\\-c]", c("a-a","b"))
    >>  [1] 1 2

>>> grep("[d\\-c]", c("a-a","b"), perl=TRUE)
    >>  [1] 1
    >> shows that escaping - works only in perl (which you will
    >> find from the background references mentioned, e.g.
    >> The interpretation of an ordinary character preceded by a
    >> backslash ('\') is undefined.
    >> .)
    >> This is all carefully documented in ?regexp, e.g.
    >> Patterns are described here as they would be printed by
    >> 'cat': do remember that backslashes need to be doubled in
    >> entering R character strings from the keyboard.
    >> This is not the first time you have wasted our resources
    >> with false bug reports, so please show more respect for
    >> the R developers' time.  You were also explicitly asked
    >> not to report on obselete versions of R.
    >> On Wed, 3 Jan 2007, wrote:

>>> Full_Name: FAN Version: 2.4.0 OS: Windows Submission
>>> from: (NULL) (
>>> These are expected:
    >>>> grep("[\-|c]", c("a-a","b"))

>>> [1] 1
    >>>> gsub("[\-|c]", "&", c("a-a","b"))

>>> [1] "a&a" "b"
>>> but these are strange:
    >>>> grep("[d|\-|c]", c("a-a","b"))

>>> integer(0)
    >>>> gsub("[d|\-|c]", "&", c("a-a","b"))

>>> [1] "a-a" "b"
>>> Thanks
>>> ______________________________________________
>>> mailing list
    FanX> ______________________________________________
    FanX> mailing list     FanX> mailing list Received on Fri Jan 05 08:21:54 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Sat 06 Jan 2007 - 05:31:01 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.