Re: [Rd] sub('^', .....) bugs (PR#7742)

From: Brian D Ripley <ripley_at_stats.ox.ac.uk>
Date: Wed 23 Mar 2005 - 10:34:30 GMT


The first is as designed: zero-length initial matches are ignored in the C code. (Don't ask me why it was designed that way.)

The second is not reproducible in the current sources, so is probably already fixed by the fix to PR#7742.

On Wed, 23 Mar 2005, Gabor Grothendieck wrote:

> <maechler <at> stat.math.ethz.ch> writes:
>
> :
> : >>>>> "David" == David Forrest <drf5n <at> maplepark.com>
> : >>>>> on Tue, 22 Mar 2005 15:02:20 -0600 (CST) writes:
> :
> : David> According to help(sub), the ^ should match the
> : David> zero-length string at the beginning of a string:
> :
> : yes, indeed.
> :
> : David> sub('^','var',1:3) # "1" "2" "3"
> : David> sub('$','var',1:3) # "1var" "2var" "3var"
> :
> : David> # This generates what I expected from the first case:
> : David> sub('^.','var',11:13) # "var1" "var2" "var3"
> :
> : there are even more fishy things here:
> :
> : 1) In your cases, the integer 'x' argument is auto-coerced to
> : character, however that fails as soon as 'perl = TRUE' is used.
> :
> : > sub('^','v_', 1:3, perl=TRUE)
> : Error in sub.perl(pattern, replacement, x, ignore.case) :
> : invalid argument
> :
> : {one can argue that this is not a bug, since the help file asks
> : for 'x' to be a character vector; OTOH, we have
> : as.character(.) magic in many other places, i.e. quite
> : naturally here;
> : at least perl=TRUE and perl=FALSE should behave consistently.}
> :
> : 2) The 'perl=TRUE' case behaves even more problematically here:
> :
> : > sub('^','v_', LETTERS[1:3], perl=TRUE)
> : [1] "A\0e" "B\0J" "C\0S"
> : > sub('^','v_', LETTERS[1:3], perl=TRUE)
> : [1] "A\0J" "B\0P" "C\0J"
> : > sub('^','v_', LETTERS[1:3], perl=TRUE)
> : [1] "A\0\0" "B\0\0" "C\0m"
> : >
> :
> : i.e., the result is random nonsense.
> :
> : Note that this happens both for R-patched (2.0.1) and R-devel (2.1.0 alpha).
> :
> : ==> "forwarded" as bug report to R-bugs
>
> Also consider the following which may be related. #1 does not
> place an X before the first word and #2 causes R to hang.
>
> R> R.version.string # Windows XP
> [1] "R version 2.1.0, 2005-03-17"
>
> R> gsub("\\b", "X", "The quick brown fox") # 1
> [1] "The Xquick Xbrown Xfox"
>
> R> gsub("\\b", "X", "The quick brown fox", perl = TRUE) # 2
> ... hangs ...
>
> ______________________________________________
> R-devel@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Wed Mar 23 21:37:53 2005

This archive was generated by hypermail 2.1.8 : Mon 20 Feb 2006 - 03:21:02 GMT