[Rd] Fix for bug in match()

From: Andreas Borg <borg_at_imbei.uni-mainz.de>
Date: Mon, 18 Jan 2010 13:30:51 +0100


Hello all,

I posted the following bug last week:

# These calls work correctly:

match(c("A", "B", "C"), c("A","C"), incomparables=NA) # okay
match(c("A", "B", "C"), "A") # okay
match("A", c("A", "B"), incomparables=NA) # okay

# This one causes R to hang:
match(c("A", "B", "C"), "A", incomparables=NA)

I found the reason in the hash table implementation in unique.c. Values in the table argument of match are stored in a hash table. Incomparables are then removed by (potentially multiple) calls to removeEntry():

static void removeEntry(SEXP table, SEXP x, int indx, HashData *d) {

    int i, *h;

    h = INTEGER(d->HashTable);
    i = d->hash(x, indx, d);
    while (h[i] != NIL) {
    if (d->equal(table, h[i], x, indx)) {

        h[i] = NA_INTEGER;  /* < 0, only index values are inserted */
        return;

}

    i = (i + 1) % d->M;
}

    h[i] = NA_INTEGER;
}

Removing a value x sets the corresponding cell to NA_INTEGER. If x is not element of the table, the cell where it would have been is set from NIL (-1) to NA_INTEGER. Therefore, subsequent calls to removeEntry() with values that are not element of the table can remove all initial NIL values from the table cells. Another call of removeEntry() or Lookup() then leads to an infinte loop because the condition

while (h[i] != NIL)

is never false.

As a fix, I propose to reset cells to NIL when removing values, which leads to the following definition:

static void removeEntry(SEXP table, SEXP x, int indx, HashData *d) {

    int i, *h;

    h = INTEGER(d->HashTable);
    i = d->hash(x, indx, d);
    while (h[i] != NIL) {
    if (d->equal(table, h[i], x, indx)) {

        h[i] = NIL;  /* < 0, only index values are inserted */
        return;

}

    i = (i + 1) % d->M;
}

}

I would have submitted a patch file, but I couldn't checkout from the svn repository.

Cheers,

Andreas

-- 
Andreas Borg
Medizinische Informatik

UNIVERSIT腡SMEDIZIN
der Johannes Gutenberg-Universit鋞
Institut f黵 Medizinische Biometrie, Epidemiologie und Informatik
Obere Zahlbacher Stra遝 69, 55131 Mainz
www.imbei.uni-mainz.de

Telefon +49 (0) 6131 175062
E-Mail: borg_at_imbei.uni-mainz.de

Diese E-Mail enth鋖t vertrauliche und/oder rechtlich gesch黷zte Informationen. Wenn Sie nicht der
richtige Adressat sind oder diese E-Mail irrt黰lich erhalten haben, informieren Sie bitte sofort den
Absender und l鰏chen Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe
dieser Mail und der darin enthaltenen Informationen ist nicht gestattet.

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Mon 18 Jan 2010 - 12:38:47 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 18 Jan 2010 - 15:00:14 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive