# Re: [R] testing independence of categorical variables

From: Petr PIKAL <petr.pikal_at_precheza.cz>
Date: Fri, 7 Dec 2007 08:46:20 +0100

Hi

Well, R does exactly what it says. From help page.

"Otherwise, x and y must be vectors or factors of the same length"

I do not know SAS but I presume that

> tables bloodtype*state

gives you something like

tab <- table(bloodtype, state)

and

chisq.test(tab)

shall give you the expected result. You can also do directly chisq.test(bloodtype, state). But what you cannot do is to test vectors unequal **lengths**, and that is what he did. I beleve that you can not do it in SAS either.

x<-sample(letters[1:3], 10, replace=T)  x
[1] "c" "a" "c" "c" "a" "c" "a" "c" "a" "a"  y<-sample(1:5, 20, replace=T)
> y

[1] 2 5 1 1 2 5 2 3 1 5 5 5 1 5 5 3 2 2 5 1
> chisq.test(x,y)
Error in chisq.test(x, y) : 'x' and 'y' must have the same length  x<-sample(letters[1:3], 20, replace=T)

Pearson's Chi-squared test

data: x and y
X-squared = 4.7937, df = 6, p-value = 0.5705

Warning message:
In chisq.test(x, y) : Chi-squared approximation may be incorrect
>

Regards
Petr

r-help-bounces_at_r-project.org napsal dne 06.12.2007 23:09:24:

>
> The chi-square does not need your two categorical variables to have
equal
> levels, nor limitation for the number of levels.
>
> The Chi-square procedure is as follow:
> χ^2=∑_(All Cells)▒〖(Observed-Expected)〗^2/Expected
>
> Expected Cell= E_ij=n((i^th RowTotal)/n)((j^th RowTotal)/n)
>
> Degree of Freedom=df= (row-1)(Col-1)
>
> This way should not give you any errors if your calculations are all
> correct. I usually use SAS for calculations like this. Below is a sample
> code I wrote to test whether US_State and Blood type are independent.
You
> can modify it for your data and should give you no error.
>
> data bloodtype;
> input bloodtype\$ state\$ count@@;
> datalines;
> A FL 122 B FL 117
> AB FL 19 O FL 244
> A IA 1781 B IA 351
> AB IA 289 O IA 3301
> A MO 353 B MO 269
> AB MO 60 O MO 713
> ;
> proc freq data=bloodtype;
> tables bloodtype*state
> / cellchi2 chisq expected norow nocol nopercent;
> weight count;
> quit;
>
>
> Best
> Ramin
> Gainesville
>
>
>
> Shoaaib Mehmood wrote:
> >
> > hi,
> >
> > is there a way of calculating of measuring dependence between two
> > categorical variables. i tried using the chi square test to test for
> > independence but i got error saying that the lengths of the two
> > vectors don't match. Suppose X and Y are two factors. X has 5 levels
> > and Y has 7 levels. This is what i tried doing
> >
> >>temp<-chisq.test(x,y)
> >
> > but got error "the lengths of the two vectors don't match". any help
> > will be appreciated
> > --
> > Regards,
> > Rana Shoaaib Mehmood
> >
> > ______________________________________________
> > R-help_at_r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
> --
> View this message in context:
http://www.nabble.com/testing-independence-of-
> categorical-variables-tf4855773.html#a14202348
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help