RE: [R] Pearson corelation and p-value for matrix

From: Dren Scott <dren_scott_at_yahoo.com>
Date: Tue 19 Apr 2005 - 03:40:50 EST


Hi all,  

Thanks Andy, Mark and John for all the help. I really appreciate it. I'm new to both R and statistics, so please excuse any gaffes on my part.  

Essentially what I'm trying to do, is to evaluate for each row, how many other rows would have a p-value < 0.05. So, after I get my N x N p-value matrix, I'll just filter out values that are > 0.05.  

Each of my datasets (6000 rows x 100 columns) would consist of some NA's. The iterative procedure (cor.pvalues) proposed by John would yield the values, but it would take an inordinately long time (I have 50 of these datasets to process). The solution proposed by Andy, although fast, would not be able to incorporate the NA's.  

Is there any workaround for the NA's? Or possibly do you think I could try something else?  

Thanks very much.  

Dren

"Liaw, Andy" <andy_liaw@merck.com> wrote:
> From: John Fox
>
> Dear Andy,
>
> That's clearly much better -- and illustrates an effective
> strategy for
> vectorizing (or "matricizing") a computation. I think I'll
> add this to my
> list of programming examples. It might be a little dangerous
> to pass ...
> through to cor(), since someone could specify
> type="spearman", for example.

Ah, yes, the "..." isn't likely to help here! Also, it will only work correctly if there are no NA's, for example (or else the degree of freedom would be wrong).

Best,
Andy

> Thanks,
> John
>
> --------------------------------
> John Fox
> Department of Sociology
> McMaster University
> Hamilton, Ontario
> Canada L8S 4M4
> 905-525-9140x23604
> http://socserv.mcmaster.ca/jfox
> --------------------------------
>
> > -----Original Message-----
> > From: Liaw, Andy [mailto:andy_liaw@merck.com]
> > Sent: Friday, April 15, 2005 9:51 PM
> > To: 'John Fox'; MSchwartz@medanalytics.com
> > Cc: 'R-Help'; 'Dren Scott'
> > Subject: RE: [R] Pearson corelation and p-value for matrix
> >
> > We can be a bit sneaky and `borrow' code from cor.test.default:
> >
> > cor.pval <- function(x, alternative="two-sided", ...) {
> > corMat <- cor(x, ...)
> > n <- nrow(x)
> > df <- n - 2
> > STATISTIC <- sqrt(df) * corMat / sqrt(1 - corMat^2)
> > p <- pt(STATISTIC, df)
> > p <- if (alternative == "less") {
> > p
> > } else if (alternative == "greater") {
> > 1 - p
> > } else 2 * pmin(p, 1 - p)
> > p
> > }
> >
> > The test:
> >
> > > system.time(c1 <- cor.pvals(X), gcFirst=TRUE)
> > [1] 13.19 0.01 13.58 NA NA
> > > system.time(c2 <- cor.pvalues(X), gcFirst=TRUE)
> > [1] 6.22 0.00 6.42 NA NA
> > > system.time(c3 <- cor.pval(X), gcFirst=TRUE)
> > [1] 0.07 0.00 0.07 NA NA
> >
> > Cheers,
> > Andy
> >
> > > From: John Fox
> > >
> > > Dear Mark,
> > >
> > > I think that the reflex of trying to avoid loops in R is often
> > > mistaken, and so I decided to try to time the two
> approaches (on a
> > > 3GHz Windows XP system).
> > >
> > > I discovered, first, that there is a bug in your function -- you
> > > appear to have indexed rows instead of columns; fixing that:
> > >
> > > cor.pvals <- function(mat)
> > > {
> > > cols <- expand.grid(1:ncol(mat), 1:ncol(mat))
> > > matrix(apply(cols, 1,
> > > function(x) cor.test(mat[, x[1]], mat[,
> > > x[2]])$p.value),
> > > ncol = ncol(mat))
> > > }
> > >
> > >
> > > My function is cor.pvalues and yours cor.pvals. This is
> for a data
> > > matrix with 1000 observations on 100 variables:
> > >
> > > > R <- diag(100)
> > > > R[upper.tri(R)] <- R[lower.tri(R)] <- .5
> > > > library(mvtnorm)
> > > > X <- rmvnorm(1000, sigma=R)
> > > > dim(X)
> > > [1] 1000 100
> > > >
> > > > system.time(cor.pvalues(X))
> > > [1] 5.53 0.00 5.53 NA NA
> > > >
> > > > system.time(cor.pvals(X))
> > > [1] 12.66 0.00 12.66 NA NA
> > > >
> > >
> > > I frankly didn't expect the advantage of my approach to be
> > this large,
> > > but there it is.
> > >
> > > Regards,
> > > John
> > >
> > > --------------------------------
> > > John Fox
> > > Department of Sociology
> > > McMaster University
> > > Hamilton, Ontario
> > > Canada L8S 4M4
> > > 905-525-9140x23604
> > > http://socserv.mcmaster.ca/jfox
> > > --------------------------------
> > >
> > > > -----Original Message-----
> > > > From: Marc Schwartz [mailto:MSchwartz@MedAnalytics.com]
> > > > Sent: Friday, April 15, 2005 7:08 PM
> > > > To: John Fox
> > > > Cc: 'Dren Scott'; R-Help
> > > > Subject: RE: [R] Pearson corelation and p-value for matrix
> > > >
> > > > Here is what might be a slightly more efficient way to
> > get to John's
> > > > question:
> > > >
> > > > cor.pvals <- function(mat)
> > > > {
> > > > rows <- expand.grid(1:nrow(mat), 1:nrow(mat))
> > > > matrix(apply(rows, 1,
> > > > function(x) cor.test(mat[x[1], ], mat[x[2],
> > > > ])$p.value),
> > > > ncol = nrow(mat))
> > > > }
> > > >
> > > > HTH,
> > > >
> > > > Marc Schwartz
> > > >
> > > > On Fri, 2005-04-15 at 18:26 -0400, John Fox wrote:
> > > > > Dear Dren,
> > > > >
> > > > > How about the following?
> > > > >
> > > > > cor.pvalues <- function(X){
> > > > > nc <- ncol(X)
> > > > > res <- matrix(0, nc, nc)
> > > > > for (i in 2:nc){
> > > > > for (j in 1:(i - 1)){
> > > > > res[i, j] <- res[j, i] <- cor.test(X[,i],
> > > X[,j])$p.value
> > > > > }
> > > > > }
> > > > > res
> > > > > }
> > > > >
> > > > > What one then does with all of those non-independent test
> > > > is another
> > > > > question, I guess.
> > > > >
> > > > > I hope this helps,
> > > > > John
> > > >
> > > > > > -----Original Message-----
> > > > > > From: r-help-bounces@stat.math.ethz.ch
> > > > > > [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of
> > > Dren Scott
> > > > > > Sent: Friday, April 15, 2005 4:33 PM
> > > > > > To: r-help@stat.math.ethz.ch
> > > > > > Subject: [R] Pearson corelation and p-value for matrix
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I was trying to evaluate the pearson correlation and
> > > the p-values
> > > > > > for an nxm matrix, where each row represents a vector.
> > > > One way to do
> > > > > > it would be to iterate through each row, and find its
> > > correlation
> > > > > > value( and the p-value) with respect to the other rows.
> > > Is there
> > > > > > some function by which I can use the matrix as input?
> > > > Ideally, the
> > > > > > output would be an nxn matrix, containing the p-values
> > > > between the
> > > > > > respective vectors.
> > > > > >
> > > > > > I have tried cor.test for the iterations, but
> couldn't find a
> > > > > > function that would take the matrix as input.
> > > > > >
> > > > > > Thanks for the help.
> > > > > >
> > > > > > Dren
> > > >
> > > >
> > >
> > > ______________________________________________
> > > R-help@stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide!
> > > http://www.R-project.org/posting-guide.html
> > >
> > >
> > >
> >
> >
> >
> > --------------------------------------------------------------
> > ----------------
> > Notice: This e-mail message, together with any attachments,
> > contains information of Merck & Co., Inc. (One Merck Drive,
> > Whitehouse Station, New Jersey, USA 08889), and/or its
> > affiliates (which may be known outside the United States as
> > Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as
> > Banyu) that may be confidential, proprietary copyrighted
> > and/or legally privileged. It is intended solely for the use
> > of the individual or entity named on this message. If you
> > are not the intended recipient, and have received this
> > message in error, please notify us immediately by reply
> > e-mail and then delete it from your system.
> > --------------------------------------------------------------
> > ----------------
>
>
>



                

        [[alternative HTML version deleted]]



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Apr 19 03:45:37 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:31:15 EST