# Re: [R] chisq.test using amalgamation automatically (possible ?!?)

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Mon 27 Jun 2005 - 17:32:26 EST

You have actually used chisq.test to test independence of the cross tabulation of x and y as factors, a table with 1 on the diagonal and 0 elsewhere. I doubt this was your intention, but unfortunately you have not told us your actual intention.

Perhaps you intended y to be the expected values, but as they do not have the same sum as x it is not clear what distribution is appropriate. (The standard theory assumes that the total count was used in determining the expected values from supplying probabilities, which is why df=9 would be used with 10 categories.)

You can use the expected values _if known in advance_ to amalgamate categories, but in most uses of chisq.test they are not known in advance. In any case, without some knowledge of the context, you cannot decide which categories should be merged: your choices are arbitrary unless the categories are ordered. Suppose they applied to types of fruit? If you know that, then certainly you can program R to do the amalgamation for you.

BTW, it is just confusing (at least to your readers) to supply the default values of arguments explicitly. pchisq(Chi.sq, df=9) would suffice.

On Sun, 26 Jun 2005, Mohammad Ehsanul Karim wrote:

> Dear List,
>
>
> If any of observed and/or expected data has less than
> 5 frequencies, then chisq.test (Pearson's Chi-squared
> Test for Count Data from package:stats) gives warning
> messages. For example,
>
> x<-c(10, 14, 10, 11, 11, 7, 8, 4, 1, 4, 4, 2, 1, 1, 2,
> 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
> y<-c(9.13112391745095, 13.1626482033341,
> 12.6623267638188, 11.0130706413029, 9.16415925139016,
> 7.47441794889028, 6.03743388141852, 4.85350508692505,
> 3.89248001363859, 3.11803140037476, 2.49617540962629,
> 1.99774139023269, 1.5985926374167, 1.27909653584089,
> 1.02341602646530, 0.818828097315106,
> 0.655132353196336, 0.524159229418155,
> 0.418022824890164, 0.335528136508225,
> 0.268448671671046, 0.214779801990545,
> 0.171840507806838, 0.137485729582785,
> 0.109999238967747, 0.0880079144684513,
> 0.070413150156564)
>
> Chi.Sq<-sum((c(x[1:7], sum(x[8:9]), sum(x[10:11]),
> sum(x[12:27]))-c(y[1:7], sum(y[8:9]), sum(y[10:11]),
> sum(y[12:27])))^2/c(y[1:7], sum(y[8:9]),
> sum(y[10:11]), sum(y[12:27]))) # using amalgamation
> pchisq(Chi.Sq, df=9, ncp=0, lower.tail = FALSE, log.p
> = FALSE) # result being 0.8830207
>
> but chisq.test(x,y) gives the following output with
> incorrect df:
>
> Pearson's Chi-squared test
>
> data: x and y
> X-squared = 216, df = 208, p-value = 0.3373
>
> Warning message:
> Chi-squared approximation may be incorrect in:
> chisq.test(x, y)
>
> Is there any way that we can use directly chisq.test
> without having warning message in such cases (that is,
> using amalgamation conveniently so that we don't have
> to check each elements if they are less than 5 or not
> - the whole process being automatic, may be by means
> of programming)?
>
> Any hint, help, support, references will be highly
> appreciated.
> Thank you for your time.

```--
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help