Re: [R] R's Spearman

From: Mendiburu, Felipe (CIP) <F.MENDIBURU_at_CGIAR.ORG>
Date: Thu, 31 May 2007 15:06:23 -0500


Dear Ray,

The R's Spearman calculated by R is correct for ties or nonties, which is not correct is the probability for the case of ties. I send to you formulates it for the correlation with ties, that is equal to R.

Regards,

Felipe de Mendiburu
Statistician

# Spearman correlation "rs" with ties or no ties rs<-function(x,y) {
d<-rank(x)-rank(y)

tx<-as.numeric(table(x))
ty<-as.numeric(table(y))
Lx<-sum((tx^3-tx)/12)
Ly<-sum((ty^3-ty)/12)

N<-length(x)
SX2<- (N^3-N)/12 - Lx
SY2<- (N^3-N)/12 - Ly
rs<- (SX2+SY2-sum(d^2))/(2*sqrt(SX2*SY2)) return(rs)
}

# Aplicacion
> cor(y[,1],y[,2],method="spearman")
[1] 0.2319084
> rs(y[,1],y[,2])
[1] 0.2319084

-----Original Message-----
From: r-help-bounces_at_stat.math.ethz.ch
[mailto:r-help-bounces_at_stat.math.ethz.ch]On Behalf Of Raymond Wan Sent: Monday, May 28, 2007 10:29 PM
To: r-help_at_stat.math.ethz.ch
Subject: [R] R's Spearman

Hi all,

I am trying to figure out the formula used by R's Spearman rho (using cor(method="spearman")) because I can't seem to get the same value as by calculating "by hand". Perhaps I'm using "cor" wrong, but I don't know where. Basically, I am running these commands:

> y=read.table(file="tmp",header=TRUE,sep="\t")
> y
   IQ Hours

1 106     7
2  86     0

3 97 20
4 113 12
5 120 12
6 110 17
> cor(y[1],y[2],method="spearman")

       Hours
IQ 0.2319084

[it's an abbreviated example of one I took from Wikipedia]. I calculated by hand (apologies if the table looks strange when pasted into e-mail):

      IQ    Hours    rank(IQ)  rank(hours)    diff    diff^2
1    106    7        3         2                 1    1
2     86    0        1         1                 0    0
3     97    20       2         6                -4    16
4    113    12       5         3.5             1.5    2.25
5    120    12       6         3.5             2.5    6.25
6    110    17       4         5                -1    1
                                                      26.5
                           
                                              rho=    0.242857

where rho = (1 - ((6 * 26.5) / 6 * (6^2 - 1))). I kept modifying the table and realized that the difference in result comes from ties. i.e., if I remove the tie in rows 4 and 5, I get the same result from both cor and calculating by hand. Perhaps I'm handling ties wrong...does anyone know how R does it or perhaps I need to change how I'm using it?

Thank you!

Ray



R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 31 May 2007 - 21:01:46 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 31 May 2007 - 23:31:37 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.