From: Christoph Buser <buser_at_stat.math.ethz.ch>

Date: Wed 29 Jun 2005 - 01:32:28 EST

Christoph Buser <buser@stat.math.ethz.ch> Seminar fuer Statistik, LEO C13

http://stat.ethz.ch/~buser/

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed Jun 29 01:38:07 2005

Date: Wed 29 Jun 2005 - 01:32:28 EST

Hi

I would recommend graphical methods to compare two samples from possible different distributions. See ?qqplot Since the Kolmogorov-Smirnov test has in many cases very small power, you can not conclude that two sample come from the same distribution only because the ks.test is not significant.

The following example shows you one problem: In a short simulation we generate 1000 times two samples (with 100 observation per sample). The first sample has a standard normal distribution, the second a t-distribution with 1 degree of freedom. For each of these 1000 pairs we calculate the ks.test and save the p.value.

x1 <- matrix(nrow = 100, ncol = 1000)

y1 <- matrix(nrow = 100, ncol = 1000)

test1 <- numeric(1000)

for(i in 1:1000) {

set.seed(i)

x1[,i] <- rnorm(100)

y1[,i] <- rt(100, df = 1)

test1[i] <- ks.test(x1[,i],y1[,i])$p.value
}

sum(test1<0.05)

Only in 309 of 1000 cases the test shows a significant difference of the two samples. In all other cases we would conclude that the two sample have the same distribution. This is an example with 100 observation per group. If you have smaller groups the power is even worse.

If we look at 10 randomly drawn pairs of the 1000 simulations and plot the qqplot:

par(mfrow = c(3,3)) ind <- sample(1:1000, 9) tmp <- sapply(ind, function(j) qqplot(x1[,j],y1[,j], xlab = paste("x1[,",j,"]"), ylab = paste("y1[,",j,"]")))

In many cases we see that the two distributions are different. Compare it to the qqplot of two normal distributed random variables:

x2 <- matrix(rnorm(900), nrow = 100, ncol = 9)
y2 <- matrix(rnorm(900), nrow = 100, ncol = 9)
par(mfrow = c(3,3))

tmp <- sapply(1:9, function(j) qqplot(x2[,j],y2[,j], xlab = paste("x2[,",j,"]"),

ylab = paste("y2[,",j,"]")))

Of course there are situations for which the graphical methods
fail, too, but it becomes apparent that it is a descriptive way
to describe two distributions.

Calculating the Kolmogorov-Smirnov test pretends a clear test
result (that two distribution are the same) which is wrong or at
least misleading.

Best regards,

Christoph Buser

Christoph Buser <buser@stat.math.ethz.ch> Seminar fuer Statistik, LEO C13

ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228

http://stat.ethz.ch/~buser/

kapo coulibaly writes:

> I'm using ks.test() to compare two different

* > measurement methods. I don't really know how to
** > interpret the output in the absence of critical value
** > table of the D statistic. I guess I could use the
** > p-value when available. But I also get the message
** > "cannot compute correct p-values with ties ..." does
** > it mean I can't use ks.test() for these data or I can
** > still use the D statistic computed to make a decision
** > whether the two samples come from the same
** > distribution.
** >
** > Thanks!!
** >
** >
** >
** > ____________________________________________________
** >
** > Rekindle the Rivalries. Sign up for Fantasy Football
** >
** > ______________________________________________
** > R-help@stat.math.ethz.ch mailing list
** > https://stat.ethz.ch/mailman/listinfo/r-help
** > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
*

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed Jun 29 01:38:07 2005

*
This archive was generated by hypermail 2.1.8
: Fri 03 Mar 2006 - 03:33:05 EST
*