# Re: [R] Kolmogorov-Smirnov test

From: Greg Snow <Greg.Snow_at_imail.org>
Date: Thu, 28 Apr 2011 14:40:38 -0600

A couple of things to consider:

What are you trying to accomplish? We may be able to give you a better approach.

```--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow_at_imail.org
801.408.8111

> -----Original Message-----

> From: r-help-bounces_at_r-project.org [mailto:r-help-bounces_at_r-
> project.org] On Behalf Of m.marcinmichal
> Sent: Wednesday, April 27, 2011 3:23 PM
> To: r-help_at_r-project.org
> Subject: [R] Kolmogorov-Smirnov test
>
> Hi,
> I have a problem with Kolmogorov-Smirnov test fit. I try fit
> distribution to
> my data. Actualy I create two test:
> - # First Kolmogorov-Smirnov Tests fit
> - # Second Kolmogorov-Smirnov Tests fit
> see below. This two test return difrent result and i don't know which
> is
> properly. Which result is properly? The first test return lower D =
> 0.0234
> and lower p-value = 0.00304. The lower 'D' indicate that distribution
> function (empirical and teoretical) coincide but low p-value indicate
> that i
> can reject hypotezis H0. For another side this p-value is most higer
> than
> p-value from second test (2.2e-16). Which result, test is most
> propertly?
>
> matr = rbind(c(1,2))
> layout(matr)
>
> # length vectorSentence = 11999
> vectorSentence <- c(....)
> vectorLength <- length(vectorSentence)
>
> # assume that we have a table(vectorSentence)
> #  1    2    3    4    5    6    7    8    9
> # 512 1878 2400 2572 1875 1206  721  520  315
>
> # Poisson parameter
> param <- fitdistr(vectorSentence, "poisson")
>
> # Expected density
> density.exp <- dpois(1:9, lambda=param[[1]][1])
>
> # Expected frequ.
> frequ.exp <- dpois(1:9, lambda=param[[1]][1])*vectorLength
>
> # Construct numeric vector of data values (y = vFrequ for Kolmogorov-
> Smirnov
> Tests)
> vFrequ <- c()
> for(i in 1:length(frequ.exp)) {
> 	vFrequ <- append(vFrequ, rep(i, times=frequ.exp[i]))
> }
>
> # Check transformation plot(density.exp, ylim=c(0,0.20)) ==
> plot(table(vFrequ)/vectorLength, ylim=c(0,0.20))
> plot(table(vectorSentence)/vectorLength)
> plot(density.exp, ylim=c(0,0.20))
> par(new=TRUE)
> plot(table(vFrequ)/vectorLength, ylim=c(0,0.20))
>
> # First Kolmogorov-Smirnov Tests fit
> ks.test(vectorSentence, vFrequ)
>
> # Second Kolmogorov-Smirnov Tests fit
> ks.test(vectorSentence, "dpois", lambda=param[[1]][1])
>
> # First Kolmogorov-Smirnov Tests fit return data
>
> Two-sample Kolmogorov-Smirnov test
>
> data:  vectorSentence and vFrequ
> D = 0.0234, p-value = 0.00304
> alternative hypothesis: two-sided
>
> Warning message:
> In ks.test(vectorSentence, vFrequ) :
>   cannot compute correct p-values with ties
>
>
> # Second Kolmogorov-Smirnov Tests fit return data
>
> One-sample Kolmogorov-Smirnov test
>
> data:  vectorSentence
> D = 0.9832, p-value < 2.2e-16
> alternative hypothesis: two-sided
>
> Warning message:
> In ks.test(vectorSentence, "dpois", lambda = param[[1]][1]) :
>   cannot compute correct p-values with ties
>
>
>
> Best
>
> Marcin M.
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-
> Smirnov-test-tp3479506p3479506.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help