Re: [R] How to Get Categorical Correlation Coefficient

From: Kum-Hoe Hwang <phdhwang_at_gmail.com>
Date: Thu 12 Oct 2006 - 10:08:56 GMT

I added a new corrected correlation and output followings:

> nrow(sdi)

[1] 65613

> print(corridor1[65600:65613])

[1] C  C  C  C  F
[6] F  F  F  B  B
[11] F F B  B

Levels: B C D E A F

> print(corridor2[65600:65613])

[1] 4 4 4 4 2 2 2 2 1 1 2 2 1 1

> summary(corridor1)

B              C                D             E
A             F
15092        13456         6652         1611         1796        27006

> summary(corridor2)

Min. 1st Qu. Median Mean 3rd Qu. Max.     0.0 1.0 2.0 2.3 3.0 5.0

Min. 1st Qu. Median Mean 3rd Qu. Max.

0 0 0 0 0 0
> table(corridor1,corridor2)

corridor2
corridor1          0     1     2     3     4     5
B       0 15092     0     0     0     0
C       0     0     0     0 13456     0
D       0     0     0  6652     0     0
E       0     0     0     0     0  1611
A      1796     0     0     0     0     0
F     0     0 27006     0     0     0

There are different correlation coefficients from the following results: Are there any functions or packages for a categorical correlation?

> cor(jh1_1, corridor1)

[1] 0.02753303
> cor(jh1_1, as.factor(corridor2))

[1] -0.3682788

On 12 Oct 2006 10:25:33 +0200, Peter Dalgaard <p.dalgaard@biostat.ku.dk> wrote:
> "Kum-Hoe Hwang" <phdhwang@gmail.com> writes:
> > Howdy Gurus !
> >
> > I have a different correlation result from the same data. The
> > "corridor1" string variable is expressed
> > as a number like the "corridor2" number variable.
> > --------------------------------------------------------------------------
> > > levels(corridor1)
> > [1] "A" "B" "C" "D" "E" "F"
> > > levels(as.factor(corridor2))
> > [1] "0" "1" "2" "3" "4"
> > ------------------------------------------------------------------------------------------
> > I have the correlation results followings using cor() function.
> > ------------------------------------------------------------------------------------------
> > > cor(jh1_1, as.factor(corridor1))
> > [1] 0.01528538
> > > cor(jh1_1, as.factor(corridor2))
> > [1] -0.4972571
> > ------------------------------------------------------------------------------------------
> > I donot know why the above correlation coefficients used the same data
> > are different.
> > They are 0.015 from as.factor(corridor1), -0.497 from as,factor(corridor2).
> > The string variable "corridor1" is the same catergory data with the
> > variable corridor2.
> > The difference is that "A" is replaced with "0", "B" with "1", "C"
> > with "2", .....
> > Could you tell me why they are different, and which correlation
> > coefficient is correct?
> One thing that strikes me is that corridor1 has 6 levels and corridor2
> has 5...
>
> In general correlations are not expected to work on factors so I'd be
> explicit about taking as.numeric(). A glance at
> table(corridor1,corridor2) should be informative too, as would a
> summary(as.numeric(as.factor(corridor1))-as.numeric(as.factor(corridor1)))
> O__ ---- Peter Dalgaard ุster Farimagsgade 5, Entr.B
> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
> (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907
Kum-Hoe Hwang, Ph.D.

