RE: [R] chisq.test() as a goodness of fit test

From: Ted Harding <Ted.Harding_at_nessie.mcc.ac.uk>
Date: Fri 14 Jan 2005 - 05:30:58 EST


On 13-Jan-05 Vito Ricci wrote:
> Dear R-Users,
>
> How can I use chisq.test() as a goodness of fit test?
> Reading man-page I've some doubts that kind of test is
> available with this statement. Am I wrong?
>
>
> X2=sum((O-E)^2)/E)
>
> O=empirical frequencies
> E=expected freq. calculated with the model (such as
> normal distribution)
>
> See:
> http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm
> for X2 used as a goodness of fit test.

It is not conspicuous in "?chisqu.test", though in fact it is the case, that chisq.test() could perform the sort of test you are looking for. No doubt this is a result of so much space devoted to the contingency table case.

However, if you use it in the form

  chisq.test(x,p)

where x is a vector of counts in "bins" and p is a vector, of the same length as x, of the probabilities that a random observation will fall in the various bins, then it is that sort of test.

So, for example, if you dissect the range of X into k intervals [,X1], (X1,X2], ... , (X[k-2],X[k-1]], (X[k-1],], let N1, N2, ... , Nk be the numbers of observations in these intervals,
let

  x = c(N1,...,Nk)

  p = c(pnorm(X1),

        pnorm(c(X2,...,X[k-1])-pnorm(c(X1,...,X[k-2]),
        1-pnorm(X[k-1]) )

then

  chisq.test(x,p)

will test the goodness of fit of the normal distribution. (Note that the above is schematic pseudo-R code, not real R code!)

However, this use of chisq.test(x,p) is limited (as far as I can see) to the case where no parameters have been estimated in choosing the distribution from which p is calculated, and so will be based on the wrong number of degrees of freedom if the distribution is estimated from the data. I cannot see any provision for specifying either the degrees of freedom, or the number of parameters estimated for p, in the documentation for chisq.test().

So in the latter case you are better off doing it directly. This is not more difficult, since the hard work is in calculating the elements of p. After that, with E=N*p,

  X2 <- sum(((O-E)^2)/E)

has the chi-squared distribution with df=(k-r) d.f. where k is the number of "bins" and r is the number of parameters that have been estimated. So get 1-pchisq(X2,df).

Best wishes,
Ted.



E-Mail: (Ted Harding) <Ted.Harding@nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 [NB: New number!]
Date: 13-Jan-05                                       Time: 18:30:58
------------------------------ XFMail ------------------------------

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Jan 14 09:23:14 2005

This archive was generated by hypermail 2.1.8 : Fri 18 Mar 2005 - 01:24:53 EST