From: Greg Snow <Greg.Snow_at_imail.org>
Date: Thu, 28 Apr 2011 14:40:38 -0600

A couple of things to consider:

What are you trying to accomplish? We may be able to give you a better approach.

> Hi,
> I have a problem with Kolmogorov-Smirnov test fit. I try fit
> distribution to
> my data. Actualy I create two test:
> - # First Kolmogorov-Smirnov Tests fit
> - # Second Kolmogorov-Smirnov Tests fit
> see below. This two test return difrent result and i don't know which
> is
> properly. Which result is properly? The first test return lower D =
> 0.0234
> and lower p-value = 0.00304. The lower 'D' indicate that distribution
> function (empirical and teoretical) coincide but low p-value indicate
> that i
> can reject hypotezis H0. For another side this p-value is most higer
> than
> p-value from second test (2.2e-16). Which result, test is most
> propertly?
>
> matr = rbind(c(1,2))
> layout(matr)
>
> # length vectorSentence = 11999
> vectorSentence <- c(....)
> vectorLength <- length(vectorSentence)
>
> # assume that we have a table(vectorSentence)
> #  1    2    3    4    5    6    7    8    9
> # 512 1878 2400 2572 1875 1206  721  520  315
>
> # Poisson parameter
> param <- fitdistr(vectorSentence, "poisson")
>
> # Expected density
> density.exp <- dpois(1:9, lambda=param[[1]][1])
>
> # Expected frequ.
> frequ.exp <- dpois(1:9, lambda=param[[1]][1])*vectorLength
>
> # Construct numeric vector of data values (y = vFrequ for Kolmogorov-
> Smirnov
> Tests)
> vFrequ <- c()
> for(i in 1:length(frequ.exp)) {
> 	vFrequ <- append(vFrequ, rep(i, times=frequ.exp[i]))
> }
>
> # Check transformation plot(density.exp, ylim=c(0,0.20)) ==
> plot(table(vFrequ)/vectorLength, ylim=c(0,0.20))
> plot(table(vectorSentence)/vectorLength)
> plot(density.exp, ylim=c(0,0.20))
> par(new=TRUE)
> plot(table(vFrequ)/vectorLength, ylim=c(0,0.20))
>
> # First Kolmogorov-Smirnov Tests fit
> ks.test(vectorSentence, vFrequ)
>
> # Second Kolmogorov-Smirnov Tests fit
> ks.test(vectorSentence, "dpois", lambda=param[[1]][1])
>
> # First Kolmogorov-Smirnov Tests fit return data
>
> Two-sample Kolmogorov-Smirnov test
>
> data:  vectorSentence and vFrequ
> D = 0.0234, p-value = 0.00304
> alternative hypothesis: two-sided
>
> Warning message:
> In ks.test(vectorSentence, vFrequ) :
>   cannot compute correct p-values with ties
>
>
> # Second Kolmogorov-Smirnov Tests fit return data
>
> One-sample Kolmogorov-Smirnov test
>
> data:  vectorSentence
> D = 0.9832, p-value < 2.2e-16
> alternative hypothesis: two-sided
>
> Warning message:
> In ks.test(vectorSentence, "dpois", lambda = param[[1]][1]) :
>   cannot compute correct p-values with ties
>
>
>
> Best
>
> Marcin M.
>
