Re: [Rd] 1.6x speedup for requal() function (in R/src/main/unique.c)

From: Duncan Murdoch <murdoch.duncan_at_gmail.com>
Date: Thu, 01 Dec 2011 22:13:40 -0500

On 11-12-01 8:40 PM, Hervé Pagès wrote:
> Hi,
>
> FWIW:
>
> /* Taken from R/src/main/unique.c */
> static int requal(SEXP x, int i, SEXP y, int j)
> {
> if (i< 0 || j< 0) return 0;
> if (!ISNAN(REAL(x)[i])&& !ISNAN(REAL(y)[j]))
> return (REAL(x)[i] == REAL(y)[j]);
> else if (R_IsNA(REAL(x)[i])&& R_IsNA(REAL(y)[j])) return 1;
> else if (R_IsNaN(REAL(x)[i])&& R_IsNaN(REAL(y)[j])) return 1;
> else return 0;
> }
>
> /* Between 1.34x and 1.37x faster on my 64-bit Ubuntu laptop */
> static int requal2(SEXP x, int i, SEXP y, int j)
> {
> double xi, yj;
>
> if (i< 0 || j< 0) return 0;
> xi = REAL(x)[i];
> yj = REAL(y)[j];
> if (!ISNAN(xi)&& !ISNAN(yj)) return xi == yj;
> if (R_IsNA(xi)&& R_IsNA(yj)) return 1;
> if (R_IsNaN(xi)&& R_IsNaN(yj)) return 1;
> return 0;
> }

That looks like a valid improvement.

>
> /* Another extra 1.18x speedup. So overall requal3() is about 1.6x
> faster than requal() for me. requal3() uses a simpler logic than
> requal() but this logic should be equivalent to the logic used
> by requal(), based on the following facts:
> (a) If *one* of xi or yi is a number (i.e. not NA or NaN),
> then xi and yi can be compared with xi == yi. They don't
> need to *both* be numbers for this comparison to be valid.
> (b) Otherwise (i.e. if each of them is not a number) then each
> of them is either NA or NaN (only 2 possible values for
> each), so comparing them with R_IsNA(xi) == R_IsNA(yj)
> should do the trick. */

I think this one is probably correct, but it's too tricky for my taste.

> static int requal3(SEXP x, int i, SEXP y, int j)
> {
> double xi, yj;
>
> if (i< 0 || j< 0) return 0;
> xi = REAL(x)[i];
> yj = REAL(y)[j];
> if (!ISNAN(xi) || !ISNAN(yj)) return xi == yj;
> return R_IsNA(xi) == R_IsNA(yj);
> }

Duncan Murdoch

>
> The logic of the cequal() function (in the same file) could also be
> cleaned up in a similar way, probably for an even greater speedup.
>
> This will benefit duplicated(), anyDuplicated() and unique() on numeric
> and complex vectors.
>
> Cheers,
> H.
>



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri 02 Dec 2011 - 03:16:55 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 04 Dec 2011 - 17:20:14 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive