Re: [Rd] O(N log N) Kendall Tau

From: Dirk Eddelbuettel <edd_at_debian.org>
Date: Sun, 13 Dec 2009 10:27:03 -0600

On 13 December 2009 at 00:38, David Simcha wrote:
| I've noticed that the implementation of Kendall's Tau in R is O(N^2).
| The following reference describes how it can be done in O(N log N):
|
| A Computer Method for Calculating Kendall's Tau with Ungrouped Data
| William R. Knight
| Journal of the American Statistical Association, Vol. 61, No. 314, Part
| 1 (Jun., 1966), pp. 436-439
| http://www.jstor.org/pss/2282833

Interesting. I recently used the performance of cor(X, method="kendall") as a contrast to the 26-fold speedup when (!!) using gpuCor(X, method="kendall") from the nice gputools package by Josh Buckner and Mark Seligman. That was using the GPU in a QuadroFX 4800 with a 1206 x 477 matrix; the full example is in my most recent 'Intro to HPC with R' slides.

| I'm interested in contributing an implementation of this algorithm to
| R. However, I have a few questions:
|
| 1. Would it likely be accepted if well-implemented?
| 2. cov.c in the main/ directory of the R source tree seems to contain
| at least 4 different but nearly identical implementations of Kendall's
| Tau: cov_complete1, cov_complete2, cov_na1, and cov_na2. I can't tell
| why.

Combine your bottom-up code analysis with a top-down usage analysis -- open up R and read 'help(cov)'. There are different ways to deal with missing observations.

| The file is very difficult to read because of its lack of
| comments, (ab)use of macros and improper indentation. As I will
| probably need to modify this file, can some seasoned veteran please
| explain what the heck is going on in this file?

Again, I think reading the help file along with the code may prove beneficial. The R implementation is pretty consistent across files to you will have to get used to those C level macros if you want to modify code at that level. The R Extensions manual may prove helpful.

Lastly, as a matter of style, I find people are more likely to help you if you don't hit them first with a two-by-four of 'your code and style is horrid'.

Cheers, Dirk

-- 
Three out of two people have difficulties with fractions.

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Sun 13 Dec 2009 - 16:30:30 GMT

This archive was generated by hypermail 2.2.0 : Mon 14 Dec 2009 - 03:01:07 GMT