Re: [R] Kolmogorov-Smirnov test

From: m.marcinmichal <m.marcinmichal_at_gmail.com>
Date: Thu, 28 Apr 2011 14:53:39 -0700 (PDT)

Hi,
thanks for response.

>> The Kolmogorov-Smirnov test is designed for distributions on continuous
>> variable, not discrete like the >> poisson. That is why you are getting
>> some of your warnings.

I read in "Fitting distributions whith R" Vito Ricci page 19 that: "... Kolmogorov-Smirnov test is used to decide if a sample comes from a population with a specific distribution. I can be applied both for discrete (count) data and continuous binned (even if some Authors do not agree on this point) and both for continuous variables" but in page 16 i read that "... while the Kolmogorov-Smirnov and Anderson-Darling tests are restricted to continuous distribution" and i was little confused, but try this test to my discrete data.

Generally in first step, I try fit my data to discret or continuous distribution (task: find distribution for emirical data). Question, Can I approximate my discret data by the continuous distribution? I know that sometmies we can poisson distribution approxime by the normal distribution. But what happen if I use another distribution like log normall or gama?

I done another three tests - chi square test. But this tests return three another results. Suppose that we have the same data i.e vectorSentence. Test:
1. One
param <- fitdistr(vectorSentence, "poisson") chisq.test(table(vectorSentence), p = dpois(1:9, lambda=param[[1]][1]), rescale.p = TRUE)

X-squared = 272.8958, df = 8, p-value < 2.2e-16

2. Two
library(vcd)
gf <- goodfit(vectorSentence, type="poisson", method="MinChisq") summary(gf)

             X^2 df P(> X^2)
Pearson 404.3607 8 2.186332e-82

3. Three
fdistc <- fitdist(vectorSentence, "pois") g<-gofstat(fdistc, print.test = TRUE)

Chi-squared statistic: 535.344
Degree of freedom of the Chi-squared distribution: 8 Chi-squared p-value: 1.824112e-110

Question which results is correct?

I know that I can reject null hipotesis: data don't come from poisson distribution. But which result is correct?

For another side I trying to accomplish another problem: 1. Suppose that we have a reference data (dr) from some process (pr) which save in vectorSentence.
2. Suppose that we have a two another sample data d1, d2 from another two process p1, p2
3. We know that all data is discrete.

Task:
One: check if data d1, d2 is equal to reference data (dr) - this is not a problem. I use a cdf, histogram, another mensure etc. chi square test. But can I use Kolmogorov-Smirnov to test cumulative distribution function hipotesis i.e F(d1) = F(d) for my data?
Two: find dr distributions discret or if possible continuous

Best

Marcin M.

--
View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-tp3479506p3482349.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Fri 29 Apr 2011 - 01:43:34 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 29 Apr 2011 - 21:50:34 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive