[Rd] 1.6x speedup for requal() function (in R/src/main/unique.c)

From: Hervé Pagès <hpages_at_fhcrc.org>
Date: Thu, 01 Dec 2011 17:40:34 -0800


Hi,

FWIW: /* Taken from R/src/main/unique.c */
static int requal(SEXP x, int i, SEXP y, int j)
{

     if (i < 0 || j < 0) return 0;
     if (!ISNAN(REAL(x)[i]) && !ISNAN(REAL(y)[j]))
         return (REAL(x)[i] == REAL(y)[j]);
     else if (R_IsNA(REAL(x)[i]) && R_IsNA(REAL(y)[j])) return 1;
     else if (R_IsNaN(REAL(x)[i]) && R_IsNaN(REAL(y)[j])) return 1;
     else return 0;

}

/* Between 1.34x and 1.37x faster on my 64-bit Ubuntu laptop */ static int requal2(SEXP x, int i, SEXP y, int j)
{

     double xi, yj;

     if (i < 0 || j < 0) return 0;
     xi = REAL(x)[i];
     yj = REAL(y)[j];
     if (!ISNAN(xi) && !ISNAN(yj)) return xi == yj;
     if (R_IsNA(xi) && R_IsNA(yj)) return 1;
     if (R_IsNaN(xi) && R_IsNaN(yj)) return 1;
     return 0;

}

/* Another extra 1.18x speedup. So overall requal3() is about 1.6x

    faster than requal() for me. requal3() uses a simpler logic than     requal() but this logic should be equivalent to the logic used     by requal(), based on the following facts:

      (a) If *one* of xi or yi is a number (i.e. not NA or NaN),
          then xi and yi can be compared with xi == yi. They don't
          need to *both* be numbers for this comparison to be valid.
      (b) Otherwise (i.e. if each of them is not a number) then each
          of them is either NA or NaN (only 2 possible values for
          each), so comparing them with R_IsNA(xi) == R_IsNA(yj)
          should do the trick. */

static int requal3(SEXP x, int i, SEXP y, int j)
{

     double xi, yj;

     if (i < 0 || j < 0) return 0;
     xi = REAL(x)[i];
     yj = REAL(y)[j];
     if (!ISNAN(xi) || !ISNAN(yj)) return xi == yj;
     return R_IsNA(xi) == R_IsNA(yj);

}

The logic of the cequal() function (in the same file) could also be cleaned up in a similar way, probably for an even greater speedup.

This will benefit duplicated(), anyDuplicated() and unique() on numeric and complex vectors.

Cheers,
H.

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages_at_fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Fri 02 Dec 2011 - 01:47:46 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 02 Dec 2011 - 11:30:13 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive