[Rd] NAs by integer overflow in Spearman's test p-value (PR#8087)

From: <jtk_at_cmp.uea.ac.uk>
Date: Tue 23 Aug 2005 - 13:15:09 GMT


Full_Name: Jan T. Kim
Version: 2.1.0 (and better)
OS: Linux
Submission from: (NULL) (139.222.3.229)

The p value in Spearman's test is NA if the length of x exceeds 46340, due to an integer overflow, occurring if length(n) > sqrt(2^31):

> n <- 46341;
> set.seed(1);
> x <- runif(n);
> y <- runif(n);
> cor.test(x, y, method = "spearman");

            Spearman's rank correlation rho

    data: x and y
    S = 1.654965e+13, p-value = NA
    alternative hypothesis: true rho is not equal to 0     sample estimates:

            rho
    0.002199426

    Warning message:
    NAs produced by integer overflow in: n * n

The integer overflow occurs in src/library/stats/R/cor.test.R, and it can be fixed
by converting n to double appropriately (see *** fix label ***, lines 110 onwards
are shown):

                ## Use the test statistic S = sum(rank(x) - rank(y))^2
                ## and AS 89 for obtaining better p-values than via the
                ## simple normal approximation.
                ## In the case of no ties, S = (1-rho) * (n^3-n)/6.
                pspearman <- function(q, n, lower.tail = TRUE) {
                    if(n <= 1290) # n*(n^2 - 1) does not overflow
                        .C("prho",
                           as.integer(n),
                           as.double(q + 1),
                           p = double(1),
                           integer(1),
                           as.logical(lower.tail),
                           PACKAGE = "stats")$p
                    else { # for large n: aymptotic t_{n-2}
                        n <- as.double(n);  # *** fix ***
                        r <- 1 - 6 * q / (n*(n*n - 1))
                        pt(r / sqrt((1 - r^2)/(n-2)), df = n-2,
                           lower.tail= !lower.tail)
                    }
                }
                q <- round((n^3 - n) * (1 - r) / 6)
                STATISTIC <- c(S = q)
                PVAL <-
                    switch(alternative,
                           "two.sided" = {
                               p <- if(q > (n^3 - n) / 6)
                                   pspearman(q - 1, n, lower.tail = FALSE)
                               else
                                   pspearman(q, n, lower.tail = TRUE)
                               min(2 * p, 1)
                           },
                           "greater" = pspearman(q, n, lower.tail = TRUE),
                           "less" = pspearman(q - 1, n, lower.tail = FALSE))
                if(TIES)
                    warning("p-values may be incorrect due to ties")

Inserting the typecast only in the pspearman function is a minimally invasive fix -- alternatively, replacing at line 17

    n <- length(x)

with

    n <- as.double(length(x))

achieves the same fix and may take care of other unnecessary integer overflows, but it may also introduce new problems e.g. in .C calls etc.



R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Tue Aug 23 23:35:24 2005

This archive was generated by hypermail 2.1.8 : Mon 20 Feb 2006 - 03:21:18 GMT