Re: [R] ks.test() output interpretation

From: Christoph Buser <buser_at_stat.math.ethz.ch>
Date: Wed 29 Jun 2005 - 01:32:28 EST

Hi

I would recommend graphical methods to compare two samples from possible different distributions. See ?qqplot Since the Kolmogorov-Smirnov test has in many cases very small power, you can not conclude that two sample come from the same distribution only because the ks.test is not significant.

The following example shows you one problem: In a short simulation we generate 1000 times two samples (with 100 observation per sample). The first sample has a standard normal distribution, the second a t-distribution with 1 degree of freedom. For each of these 1000 pairs we calculate the ks.test and save the p.value.

x1 <- matrix(nrow = 100, ncol = 1000)
y1 <- matrix(nrow = 100, ncol = 1000)
test1 <- numeric(1000)
for(i in 1:1000) {
  set.seed(i)
  x1[,i] <- rnorm(100)
  y1[,i] <- rt(100, df = 1)
  test1[i] <- ks.test(x1[,i],y1[,i])$p.value }
sum(test1<0.05)

Only in 309 of 1000 cases the test shows a significant difference of the two samples. In all other cases we would conclude that the two sample have the same distribution. This is an example with 100 observation per group. If you have smaller groups the power is even worse.

If we look at 10 randomly drawn pairs of the 1000 simulations and plot the qqplot:

par(mfrow = c(3,3))
ind <- sample(1:1000, 9)
tmp <- sapply(ind, function(j) qqplot(x1[,j],y1[,j], xlab = paste("x1[,",j,"]"),
                                      ylab = paste("y1[,",j,"]")))

In many cases we see that the two distributions are different. Compare it to the qqplot of two normal distributed random variables:

x2 <- matrix(rnorm(900), nrow = 100, ncol = 9) y2 <- matrix(rnorm(900), nrow = 100, ncol = 9) par(mfrow = c(3,3))
tmp <- sapply(1:9, function(j) qqplot(x2[,j],y2[,j], xlab = paste("x2[,",j,"]"),

                                      ylab = paste("y2[,",j,"]")))

Of course there are situations for which the graphical methods fail, too, but it becomes apparent that it is a descriptive way to describe two distributions.
Calculating the Kolmogorov-Smirnov test pretends a clear test result (that two distribution are the same) which is wrong or at least misleading.

Best regards,

Christoph Buser



Christoph Buser <buser@stat.math.ethz.ch> Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)	8092 Zurich	 SWITZERLAND
phone: x-41-44-632-4673		fax: 632-1228

http://stat.ethz.ch/~buser/

kapo coulibaly writes:
> I'm using ks.test() to compare two different
> measurement methods. I don't really know how to
> interpret the output in the absence of critical value
> table of the D statistic. I guess I could use the
> p-value when available. But I also get the message
> "cannot compute correct p-values with ties ..." does
> it mean I can't use ks.test() for these data or I can
> still use the D statistic computed to make a decision
> whether the two samples come from the same
> distribution.
>
> Thanks!!
>
>
>
> ____________________________________________________
>
> Rekindle the Rivalries. Sign up for Fantasy Football
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed Jun 29 01:38:07 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:33:05 EST