Re: [R] Query:chi-squre test

From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>
Date: Tue 11 Jul 2006 - 00:28:12 EST

"priti desai" <priti.desai@kalyptorisk.com> writes:

> Hi,
> I have calculated chi-square goodness of fit test,Sample coming from
> Poisson distribution.
> please copy this script in R & run the script
> The R script is as follows
>
> ########################## start
> #########################################
>
> No_of_Frauds<-
> c(4,1,6,9,9,10,2,4,8,2,3,0,1,2,3,1,3,4,5,4,4,4,9,5,4,3,11,8,12,3,10,0,7)
>
>
>
> lambda<- mean(No_of_Frauds)
>
>
> # Chi-Squared Goodness of Fit Test
>
> # Ho: The data follow a specified distribution Vs H1: Not Ho
>
> # observed frequencies
>
> variable.cnts <- table(No_of_Frauds)
> variable.cnts
>
> variable.cnts.prs <- dpois(as.numeric(names(variable.cnts)), lambda)
> variable.cnts.prs
>
> variable.cnts <- c(variable.cnts, 0)
> variable.cnts
> variable.cnts.prs <- c(variable.cnts.prs, 1-sum(variable.cnts.prs))
> variable.cnts.prs
>
> tst <- chisq.test(variable.cnts, p=variable.cnts.prs)
> Tst
>
> ######################### end ########################################
>
>
> The result of R is as follows
>
> Warning message:
> Chi-squared approximation may be incorrect in: chisq.test(variable.cnts,
> p = variable.cnts.prs)
> > tst
>
> Chi-squared test for given probabilities
>
> data: variable.cnts
> X-squared = 40.5614, df = 13, p-value = 0.0001122
>
>
> But I have done calculations in Excel. I am getting different answer.
>
> Observed = 2,3,3,5,7,2,1,1,2,3,2,1,1,0
> Expected=0.251005528,1.224602726,2.987288468,4.85811559,5.925428863,5.78
> 1782103,4.701348074,3.276697142,1.998288788,1.083247457,0.528493456,0.23
> 4400679,0.095299266,0.035764993
>
>
> Estimated Parameter =4.878788
>
> Chi square stat = 0.000113
>
>
> My excel answer tally with the book which I have refer for excel.
> Please tell me the correct calculation in R.
> And how to interprit the results in R.

As far as I can see, the "Chi square stat" in Excel is essentially the p-value in R. The slight difference appears to arise from Excel using the point probability rather than the tail ditto in the last cell:

> O <- c(2,3,3,5,7,2,1,1,2,3,2,1,1,0)
> E <- c(0.251005528,1.224602726,2.987288468,4.85811559,5.925428863,
+ 5.781782103,4.701348074,3.276697142,1.998288788,1.083247457,0.528493456, + 0.234400679,0.095299266,0.035764993)
> (O-E)^2/E

 [1] 1.218691e+01 2.573925e+00 5.409021e-05 4.143826e-03 1.948725e-01
 [6] 2.473610e+00 2.914053e+00 1.581883e+00 1.465377e-06 3.391598e+00
[11] 4.097178e+00 2.500600e+00 8.588560e+00 3.576499e-02

> sum((O-E)^2/E)
[1] 40.54315
> pchisq(sum((O-E)^2/E), 13,low=F)

[1] 0.0001129818
> E
 [1] 0.25100553 1.22460273 2.98728847 4.85811559 5.92542886 5.78178210
 [7] 4.70134807 3.27669714 1.99828879 1.08324746 0.52849346 0.23440068
[13] 0.09529927 0.03576499

> sum(E)

[1] 32.98176

Please don't assume that something is correct, just because it is Excel output and some book mindlessly copied it...

The calculations are both wrong, because they ignore the fact that lambda has been estimated from the data, and also because they deal with very small expected cell counts. For a better test, you likely need to simulate the distribution of the chi-square, or, as I'd be inclined to do, go directly for the pretty obvious overdispersion:

> var(X)

[1] 11.17235
> var(X)/mean(X) # expected is ca. 1 in the Poisson distrib.
[1] 2.289984
> r <- replicate(100000,{X <- rpois(33, 4.87879); var(X)/mean(X)})
> sum(r > 2.289984)
[1] 5

-- 
   O__  ---- Peter Dalgaard             ุster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)                  FAX: (+45) 35327907

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Tue Jul 11 00:42:35 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 11 Jul 2006 - 02:15:50 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.