Re: [Rd] cor() fails with big dataframe

From: Martin Maechler <maechler_at_stat.math.ethz.ch>
Date: Thu 16 Sep 2004 - 20:44:09 EST

>>>>> "Mayeul" == Mayeul KAUFFMANN <mayeul.kauffmann@tiscali.fr> >>>>> on Thu, 16 Sep 2004 01:23:09 +0200 writes:

    Mayeul> Hello,
    Mayeul> I have a big dataframe with *NO* na's (9 columns, 293380 rows).

    Mayeul> # doing
    Mayeul> memory.limit(size = 1000000000)
    Mayeul> cor(x)
    Mayeul> #gives
    Mayeul> Error in cor(x) : missing observations in cov/cor
    Mayeul> In addition: Warning message:
    Mayeul> NAs introduced by coercion

"by coercion" means there were other things *coerced* to NAs!

One of the biggest problem with R users (and other S users for that matter) is that if they get an error, they throw hands up and ask for help - assuming the error message to be non-intelligible. Whereas it *is* intelligible (slightly ? ;-) more often than not ...

    Mayeul> #I found the obvious workaround:
    Mayeul> COR <- matrix(rep(0, 81),9,9)
    Mayeul> for (i in 1:9) for (j in 1:9) {if (i>j) COR[i,j] <- cor (x[,i],x[,j])}
    Mayeul> #which works fine, with no warning

    Mayeul> #looks like a "cor()" bug.

quite improbably.

The following works flawlessly for me
and the only things that takes a bit of time is construction of x, not cor():

> n <- 300000
> set.seed(1)
> x <- as.data.frame(matrix(rnorm(n*9), n,9))
> cx <- cor(x)
> str(cx)

   num [1:9, 1:9] 1.00000 -0.00039 0.00113 0.00134 -0.00228 ...

    Mayeul> #I checked absence of NA's by
    Mayeul> x <- x[complete.cases(x),]
    Mayeul> summary(x)
    Mayeul> apply(x,2, function (x) (sum(is.na(x))))

    Mayeul> #I use R 1.9.1

What does

    sapply(x, function(u)all(is.finite(u))) return ?



R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Thu Sep 16 20:47:41 2004

This archive was generated by hypermail 2.1.8 : Fri 18 Mar 2005 - 09:00:18 EST