Re: [Rd] Bug in agrep computing edit distance?

From: Dickison, Daniel <ddickison_at_carnegielearning.com>
Date: Thu, 18 Nov 2010 10:56:57 -0500


A followup to this. I got R to compile, and the following patch seems to fix this issue (I don't think my previous attachment worked so it's pasted inline).

There is still a quirk, where tail insertions seem to cost 1 extra and I'm not sure why. In the first example below, 3 and 5 should match, and in the second, 5 should match, but they don't unless max.distance=3:

> agrep("x", c("x", "y", "ax1", "abx", "x12", "ax12", "abx1"),
>max.distance=2)

[1] 1 2 4
> agrep("ax1", c("x", "y", "ax1", "abx", "x12", "ax12", "abx1"),
>max.distance=2)

[1] 1 3 4 6 7

In any case, I think this is more in line with the documentation. I'm very new to hacking on R so please let me know if this isn't the right way to submit patches...

Daniel

Index: src/library/base/R/grep.R


     n <- nchar(pattern, "c")
     if(is.na(n)) stop("invalid multibyte string for 'pattern'")
+
+    ## make pattern match the whole string
+    pattern <- gsub("\\", "\\\\", pattern, fixed=TRUE)
+ pattern <- paste("^", pattern, "$", sep="") +
     if(!is.list(max.distance)) {
         if(!is.numeric(max.distance) || (max.distance < 0))
             stop("'max.distance' must be non-negative")
Index: src/main/agrep.c
     checkArity(op, args);
     pat = CAR(args); args = CDR(args);



Daniel Dickison
Research Programmer
ddickison_at_carnegielearning.com
Toll Free: (888) 851-7094 x103
FAX: (412) 690-2444 Revolutionary Math Curricula. Revolutionary Results.

Carnegie Learning, Inc. | 437 Grant St. 20th Floor | Pittsburgh, PA 15219 www.carnegielearning.com



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Thu 18 Nov 2010 - 16:00:15 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 18 Nov 2010 - 17:10:22 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive