Re: [R] bug? in stats::cor for use=complete.obs with NAs

From: Peter Ehlers <ehlers_at_ucalgary.ca>
Date: Thu, 10 Jun 2010 17:02:36 -0600

I don't think that this would be considered a bug. The reason for the discrepancy between use="complete.obs" and use="pairwise.complete.obs" for the case of the Spearman correlation of two vectors x, y is this:

"pairwise" does complete.cases(x,y) and then ranks; this is also what's done in cor.test().

"complete" ranks first (keeping NAs via the na.last="keep" argument to rank()) and then does complete.cases(ranked.x,ranked.y) on the ranked data. This can obviously lead to a different set of ranks being correlated than those for "pairwise".

I must admit that I wasn't aware that R does this and I don't know the rationale for it. The help page says:

    If use is "complete.obs" then missing values are     handled by casewise deletion ...

which is not clear on the order of ranking and deletion, but further down the page:

    Note that "spearman" basically computes cor(R(x), R(y))     (or cov(.,.)) where R(u) := rank(u, na.last="keep").     In the case of missing values, the ranks are calculated     depending on the value of use, either based on complete     observations, or based on pairwise completeness with     reranking for each pair.

I guess that this implies that, for "complete", the ranking occurs before the casewise deletion (else why the na.last="keep"?).

If anyone knows the rationale and/or can give a reference, I'd be glad to receive such.

   -Peter Ehlers

On 2010-06-09 11:36, hugh.genin_at_thomsonreuters.com wrote:
> Arrrrr,
>
> I think I've found a bug in the behavior of the stats::cor function when
> NAs are present, but in case I'm missing something, could you look over
> this example and let me know what you think:
>
>
>> a = c(1,3,NA,1,2)
>> b = c(1,2,1,1,4)
>> cor(a,b,method="spearman", use="complete.obs")
> [1] 0.8164966
>> cor(a,b,method="spearman", use="pairwise.complete.obs")
> [1] 0.7777778
>
> My understanding is that, when the inputs are vectors (but not
> necessarily when they're matrices), the "complete.obs" and
> "pairwise.complete.obs" arguments should give identical spearman
> correlations. The above example clearly shows they do not in my version
> of R (2.11.1). However, in cor.test, they do:
>
>
>> cor.test(a,b,method="spearman", use="complete.obs")
>
> Spearman's rank correlation rho
>
> data: a and b
> S = 2.2222, p-value = 0.2222
> alternative hypothesis: true rho is not equal to 0
> sample estimates:
> rho
> 0.7777778
>
>
> So cor and cor.test do not agree, which seems very likely to be a bug.
> When calculating by hand, I also get 0.7777778. Additionally, when
> using an old version of R (2.5.0), both the complete.obs and
> pairwise.complete.obs versions give 0.7777778. Which strongly suggests
> either 2.5.0 or 2.11.1 has a bug in it. Is this a bug? If so, has it
> already been reported? (I found a related but confusing email thread
> from 2004 in the R archives, but I did not find the resolution to that
> bug report).
>
>
> Additional info:
> Platform = Windows XP
>> sessionInfo()
> R version 2.11.1 (2010-05-31)
> i386-pc-mingw32
>
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
> States.1252 LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C LC_TIME=English_United
> States.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
>
> loaded via a namespace (and not attached):
> [1] tools_2.11.1
>> Sys.getlocale()
> [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
>
> Thanks,
>
> --Hugh
>



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 10 Jun 2010 - 23:05:09 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 11 Jun 2010 - 01:10:29 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive