[R] cor and missing values. Bug?

About this list Date view Thread view Subject view Author view Attachment view

From: Jane Fridlyand (janef@stat.berkeley.edu)
Date: Wed 26 May 2004 - 09:17:57 EST

Message-id: <Pine.SOL.4.50.0405251607530.20070-100000@toto.Berkeley.EDU>

There seems to be an issue in computing rank correlations with missing
values present. I think this comes from the way rank() function works but
I am not sure how to go about this. Rank function places missing values at
the end by default thus skewing the rank relationship between two vectors:


R : Copyright 2003, The R Foundation for Statistical Computing
Version 1.8.1 (2003-11-21), ISBN 3-900051-00-3

> vec1 <- 1:10
> vec2 <- 2*vec1
> vec1[c(1, 5)] <- NA
> cor(vec1, vec2, use="pair", method="pearson")
[1] 1
> cor(vec1[-c(1,5)], vec2[-c(1,5)], use="pair", method="pearson")
[1] 1
#pearson is OK
> cor(vec1, vec2, use="pair", method="spearman")
[1] 0.3212121
> cor(vec1[-c(1,5)], vec2[-c(1,5)], use="pair", method="spearman")
[1] 1
> cor(vec1, vec2, use="complete", method="spearman")
[1] 0.3212121
Interestingly, "complete" option which should exclude missing values
entirely does not fix an issue either. I think that rank function must be
applied before "use" is used (actually it is the case looking at the
actual code of cor).

I looked though the archives but have not seen this reported. Is it a bug
of rank-correlations or am I misinterpreting the intention?

Thank you


Jane Fridlyand, Assistant Professor
Department of Epidemiology and Biostatistics
Center for Bioinformatics and Molecular Biostatistics
UCSF Comprehensive Cancer Center,
Box 0128 San Francisco, CA 94143-0128
Office: Room N224 Tel: (415)476-0168 Fax: (415)502-3179

R-help@stat.math.ethz.ch mailing list
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.3 : Mon 31 May 2004 - 23:05:12 EST