From: Christoph Buser <buser_at_stat.math.ethz.ch>

Date: Wed 23 Mar 2005 - 00:32:14 EST

Christoph Buser <buser@stat.math.ethz.ch> Seminar fuer Statistik, LEO C11

http://stat.ethz.ch/~buser/

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed Mar 23 00:44:31 2005

Date: Wed 23 Mar 2005 - 00:32:14 EST

Dear Anthony

I don't know how SAS calculates the p-value, but in R the p-value is calculated under the assumption that the parameters of the distribution (you want to compare with your samples) are known and not estimated from the data.

In your example you estimate them from the data (by mean(w) and sd(w) and therefore the p-values are not reliable. Somehow you fit the theoretical distribution to well to your data (using mean and sd, estimated from the data). Hence you are too conservative and the p.values are two large. Maybe SAS does a correction for the estimation of the parameters and therefore gets smaller p-values, but this is pure speculation since I don't know the way how SAS is doing the calculation.

I did a simulation and created 10000 samples from a normal distribution and calculated the ks.test. I expected around 500 significant results (on the level 0.05) by chance and got 1 or 2.

I recommend to use graphical methods (e.g. normal plots) to
validate the normal distribution of your data instead of testing
it.

See also ?qqnorm or ?qqplot.

Regards,

Christoph Buser

Christoph Buser <buser@stat.math.ethz.ch> Seminar fuer Statistik, LEO C11

ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-5414 fax: 632-1228

http://stat.ethz.ch/~buser/

Anthony Landrevie writes:

* >
*

> Hello,

* >
** > While doing test of normality under R and SAS, in order to prove the efficiency of R to my company, I notice
** >
** > that Anderson Darling, Cramer Van Mises and Shapiro-Wilk tests results are quite the same under the two environnements,
** >
** > but the Kolmogorov-smirnov p-value really is different.
** >
** > Here is what I do:
** >
** > > ks.test(w,pnorm,mean(w),sd(w))
** >
** > One-sample Kolmogorov-Smirnov test
** >
** > data: w
** >
** > D = 0.2143, p-value = 0.3803
** >
** > alternative hypothesis: two.sided
** >
** > > w
** >
** > [1] 3837 3334 2208 1745 2576 3208 3746 3523 3430 3480 3116 3428 2184 2383 3500 3866 3542
** >
** > [18] 3278
** >
** >
** >
** > SAS results:
** >
** > Kolmogorov-Smirnov D 0.214278 Pr > D 0.0271
** >
** > Why is the p-value so high under R? Much higher than with other tests.
** >
** > Best regards,
** >
** > Anthony Landrevie (French Student)
** >
** >
** >
** > ---------------------------------
** >
** >
** > [[alternative HTML version deleted]]
** >
** > ______________________________________________
** > R-help@stat.math.ethz.ch mailing list
** > https://stat.ethz.ch/mailman/listinfo/r-help
** > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
*

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed Mar 23 00:44:31 2005

*
This archive was generated by hypermail 2.1.8
: Fri 03 Mar 2006 - 03:30:52 EST
*