Re: [R] Pairwise n for large correlation tables?

From: Christos Hatzis <christos_at_nuverabio.com>
Date: Tue 08 Aug 2006 - 12:44:03 EST


Hi,

You can use complete.cases
It should run faster than the code you suggested.

See following example:

x <- matrix(runif(30),10,3)

# introduce missing values

x[sample(1:10,3),1] <- NA
x[sample(1:10,3),2] <- NA
x[sample(1:10,3),3] <- NA

cor(x,use="pairwise.complete.obs")

n <- ncol(x)
n.na <- matrix(0, n, n)
for (i in seq(1, n)) {

    n.na[i,i] <- sum( complete.cases(x[, i]) )     for (j in seq(i+1, length=n-i)) {

        ok <- sum( complete.cases(x[, i], x[, j]) )
        n.na[i,j] <- n.na[j,i] <- ok

    }
}
 

HTH -Christos

-----Original Message-----
From: r-help-bounces@stat.math.ethz.ch
[mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Adam D. I. Kramer Sent: Monday, August 07, 2006 10:04 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Pairwise n for large correlation tables?

Hello,

I'm using a very large data set (n > 100,000 for 7 columns), for which I'm pretty happy dealing with pairwise-deleted correlations to populate my correlation table. E.g.,

a <- cor(cbind(col1, col2, col3),use="pairwise.complete.obs")

...however, I am interested in the number of cases used to compute each cell of the correlation table. I am unable to find such a function via google searches, so I wrote one of my own. This turns out to be highly inefficient (e.g., it takes much, MUCH longer than the correlations do). Any hints, regarding other functions to use or ways to maket his speedier, would be much appreciated!

pairwise.n <- function(df=stop("Must provide data frame!")) {

   if (!is.data.frame(df)) {
     df <- as.data.frame(df)
   }
   colNum <- ncol(df)
   result <-
matrix(data=NA,nrow=colNum,ncol=ncolNum,dimnames=list(colnames(df),colnames( df)))

   for(i in 1:colNum) {

     for (j in i:colNum) {
       result[i,j] <- length(df[!is.na(df[i])&!is.na(df[j])])/colNum
     }

   }
   result
}
--
Adam D. I. Kramer
University of Oregon

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Tue Aug 08 14:01:08 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 08 Aug 2006 - 14:21:48 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.