Re: [R] Chi-Square Test Disagreement

From: Berwin A Turlach <berwin_at_maths.uwa.edu.au>
Date: Thu, 27 Nov 2008 00:46:31 +0800

G'day Andy,

On Wed, 26 Nov 2008 14:51:50 +0000
Andrew Choens <andy.choens_at_gmail.com> wrote:

> I was asked by my boss to do an analysis on a large data set, and I am
> trying to convince him to let me use R rather than SPSS.

Very laudable of you. :)

> This is the output from R:
> > chisq.test(test29)
>
> Pearson's Chi-squared test
>
> data: test29
> X-squared = 9.593, df = 4, p-value = 0.04787
>
> But, the same data in SPSS generates a p value of .051. It's a small
> but important difference.

Chuck explained already the reason for this small difference. I just take issue about it being an important difference. In my opinion, this difference is not important at all. It would only be important to people who are still sticking to arbitrary cut-off points that are mainly due to historical coincidences and the lack of computing power at those time in history. If somebody tells you that this difference is important, ask him or her whether he or she will be willing to finance you a room full of calculators (in the sense of Pearson's time) and whether he or she wants you to do all your calculations and analyses with these calculators in future. Alternatively, you could ask the person whether he or she would like the anaesthetist during his or her next operation to use chloroform given his or her nostalgic penchant for out-dated rituals/methods.

> I played around and rescaled things, and tried different values for
> B, but I never could get R to reach .051.

Well, I have no problem when using simulated p-values to get something close to 0.051; look at the last try. The second one might also be noteworthy. Unfortunately, I didn't save the seed beforehand.

> test29 <- matrix(c(110,358,71,312,29,139,31,77,13,32), byrow=TRUE,
> ncol=2) test29

     [,1] [,2]

[1,]  110  358
[2,]   71  312
[3,]   29  139
[4,]   31   77
[5,]   13   32

> chisq.test(test29, simul=TRUE)
	Pearson's Chi-squared test with simulated p-value (based on 2000
	replicates)

data: test29
X-squared = 9.593, df = NA, p-value = 0.04798

> chisq.test(test29, simul=TRUE)

	Pearson's Chi-squared test with simulated p-value (based on 2000
	replicates)

data: test29
X-squared = 9.593, df = NA, p-value = 0.05697

> chisq.test(test29, simul=TRUE, B=20000)

        Pearson's Chi-squared test with simulated p-value (based on 20000 replicates)

data: test29
X-squared = 9.593, df = NA, p-value = 0.0463

> chisq.test(test29, simul=TRUE, B=20000)

        Pearson's Chi-squared test with simulated p-value (based on 20000 replicates)

data: test29
X-squared = 9.593, df = NA, p-value = 0.0499

> chisq.test(test29, simul=TRUE, B=20000)

        Pearson's Chi-squared test with simulated p-value (based on 20000 replicates)

data: test29
X-squared = 9.593, df = NA, p-value = 0.0486

> chisq.test(test29, simul=TRUE, B=20000)

        Pearson's Chi-squared test with simulated p-value (based on 20000 replicates)

data: test29
X-squared = 9.593, df = NA, p-value = 0.05125

Cheers,

        Berwin


R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 26 Nov 2008 - 16:50:48 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 26 Nov 2008 - 18:30:28 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive