Re: [Rd] pb in regular expression with the character "-" (PR#9437)

From: Herve Pages <hpages_at_fhcrc.org>
Date: Sat 06 Jan 2007 - 04:41:48 GMT

Hi all,

maechler@stat.math.ethz.ch wrote:
>
> Consider my guesstimate:
> For 99% of all R users, the amount of time they need working
> pretty intensely with R before they find a bug in it,
> is nowadays more than three years, and maybe even much more
> -- such as their lifetime :-)

Perhaps I belong to the 1% of unlucky users that don't have to wait that long ;-)

  > nchar("ťA", type = "bytes")
  [1] 3
  > nchar("ťA", type = "chars")
  [1] 2

  OK

Now:

  > regexpr("A", "ťA")
  [1] 2
  attr(,"match.length")
  [1] 1

  still OK

But:

  > regexpr("A", "ťA", useBytes=TRUE)
  [1] 2
  attr(,"match.length")
  [1] 1

  not OK anymore (3 expected, not 2)

Let's try with fixed=TRUE:

  > regexpr("A", "ťA", useBytes=TRUE, fixed=TRUE)   [1] 3
  attr(,"match.length")
  [1] 1

  much better!

H.

> sessionInfo()

R version 2.5.0 Under development (unstable) (2007-01-05 r40386) i686-pc-linux-gnu

locale:
LC_CTYPE=en_CA.UTF-8;LC_NUMERIC=C;LC_TIME=en_CA.UTF-8;LC_COLLATE=en_CA.UTF-8;LC_MONETARY=en_CA.UTF-8;LC_MESSAGES=en_CA.UTF-8;LC_PAPER=en_CA.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_CA.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" [7] "base"

but this happens also in 2.4.0 and 2.4.1.



R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sat Jan 06 15:44:56 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Sat 06 Jan 2007 - 08:31:03 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.